Sequence determination in confined regions

ABSTRACT

A sequencing methodology is disclosed that allows a single DNA or RNA molecule or portion thereof to be sequenced directly and in substantially real time. The methodology involves engineering a polymerase and/or dNTPs with atomic and/or molecular tags that have a detectable property that is monitored by a detection system.

This application is a continuation of U.S. patent application Ser. No.11/648,174, filed Dec. 29, 2006 and published as United States PatentApplication Publication No. 2007/0172865 A1 on Jul. 26, 2007, which is adivisional of U.S. application Ser. No. 09/901,782, filed Jul. 9, 2001and published as United States Patent Application Publication No.2003/0064366 A1 on Apr. 3, 2003, which claims provisional priority toU.S. Provisional Patent Application No. 60/216,594, filed Jul. 7, 2000,all of which are incorporated herein by reference in their entirety.

BACKGROUND OF THE INVENTION

1. Background of the Invention

The present invention relates to a single-molecule sequencing apparatusand methods.

More particularly, the present invention relates to a single-moleculesequencing apparatus and methods using tagged polymerizing agents and/ortagged monomers where the tagged polymerizing agent and/or the taggedmonomers undergo a change in a detectable property before, during and/orafter monomer insertion into a growing polymer chain. The apparatus andmethods are ideally-suited for sequencing DNA, RNA, polypeptide,carbohydrate or similar bio-molecular sequences under near real-time orreal-time conditions. The present invention also relates to asingle-molecule sequencing apparatus and methods using taggeddepolymerizing agents and/or tagged depolymerizable polymer where thetagged depolymerizing agent and/or the tagged depolymerizable polymerundergo a change in a detectable property before, during and/or aftermonomer removal from the depolymerizable polymer chain. The apparatusand methods are ideally-suited for sequencing DNA, RNA, polypeptide,carbohydrate or similar bio-molecular sequences. The present inventionalso relates to detecting a signal evidencing interactions between thetagged polymerizing agent or depolymerizing agent and a tagged oruntagged polymer subunit such as a monomer or collection of monomers,where the detected signal provides information about monomer order. In apreferred embodiment, the methods are carried out in real-time or nearreal-time.

2. Description of the Related Art

Overview of Conventional DNA Sequencing

The development of methods that allow one to quickly and reliablydetermine the order of bases or ‘sequence’ in a fragment of DNA is a keytechnical advance, the importance of which cannot be overstated.Knowledge of DNA sequence enables a greater understanding of themolecular basis of life. DNA sequence information provides scientistswith information critical to a wide range of biological processes. Theorder of bases in DNA specifies the order of bases in RNA, the moleculewithin the cell that directly encodes the informational content ofproteins. DNA sequence information is routinely used to deduce proteinsequence information. Base order dictates DNA structure and itsfunction, and provides a molecular program that can specify normaldevelopment, manifestation of a genetic disease, or cancer.

Knowledge of DNA sequence and the ability to manipulate these sequenceshas accelerated development of biotechnology and led to the developmentof molecular techniques that provide the tools to ask and answerimportant scientific questions. The polymerase chain reaction (PCR), animportant biotechnique that facilitates sequence-specific detection ofnucleic acid, relies on sequence information. DNA sequencing methodsallow scientists to determine whether a change has been introduced intothe DNA, and to assay the effect of the change on the biology of theorganism, regardless of the type of organism that is being studied.Ultimately, DNA sequence information may provide a way to uniquelyidentify individuals.

In order to understand the DNA sequencing process, one must recallseveral facts about DNA. First, a DNA molecule is comprised of fourbases, adenine (A), guanine (G), cytosine (C), and thymine (T). Thesebases interact with each other in very specific ways through hydrogenbonds, such that A interacts with T, and G interacts with C. Thesespecific interactions between the bases are referred to asbase-pairings. In fact, it is these base-pairings (and base stackinginteractions) that stabilize double-stranded DNA. The two strands of aDNA molecule occur in an antiparallel orientation, where one strand ispositioned in the 5′ to 3′ direction, and the other strand is positionedin the 3′ to 5′ direction. The terms 5′ and 3′ refer to thedirectionality of the DNA backbone, and are critical to describing theorder of the bases. The convention for describing base order in a DNAsequence uses the 5′ to 3′ direction, and is written from left to right.Thus, if one knows the sequence of one DNA strand, the complementarysequence can be deduced.

Sanger DNA Sequencing (Enzymatic Synthesis)

Sanger sequencing is currently the most commonly used method to sequenceDNA (Sanger et al., 1977). This method exploits several features of aDNA polymerase: its ability to make an exact copy of a DNA molecule, itsdirectionality of synthesis (5′ to 3′), its requirement of a DNA strand(a ‘primer’) from which to begin synthesis, and its requirement for a 3′OH at the end of the primer. If a 3′ OH is not available, then the DNAstrand cannot be extended by the polymerase. If a dideoxynucleotide(ddNTP; ddATP, ddTTP, ddGTP, ddCTP), a base analog lacking a 3′ OH, isadded into an enzymatic sequencing reaction, it is incorporated into thegrowing strand by the polymerase. However, once the ddNTP isincorporated, the polymerase is unable to add any additional bases tothe end of the strand. Importantly, ddNTPs are incorporated by thepolymerase into the DNA strand using the same base incorporation rulesthat dictate incorporation of natural nucleotides, where A specifiesincorporation of T, and G specifies incorporation of C (and vice versa).

Fluorescent DNA Sequencing

A major advance in determining DNA sequence information occurred withthe introduction of automated DNA sequencing machines (Smith et al.,1986). The automated sequencer is used to separate sequencing reactionproducts, detect and collect (via computer) the data from the reactions,and analyze the order of the bases to automatically deduce the basesequence of a DNA fragment. Automated sequencers detect extensionproducts containing a fluorescent tag. Sequence read lengths obtainedusing an automated sequencer are dependent upon a variety of parameters,but typically range between 500 to 1,000 bases (3-18 hours of datacollection). At maximum capacity an automated sequencer can collect datafrom 96 samples in parallel.

When dye-labeled terminator chemistry is used to detect the sequencingproducts, base identity is determined by the color of the fluorescenttag attached to the ddNTP. After the reaction is assembled and processedthrough the appropriate number of cycles (3-12 hours), the extensionproducts are prepared for loading into a single lane on an automatedsequencer (unincorporated, dye-labeled ddNTPs are removed and thereaction is concentrated; 1-2 hours). An advantage of dye-terminatorchemistry is that extension products are visualized only if theyterminate with a dye-labeled ddNTP; prematurely terminated products arenot detected. Thus, reduced background noise typically results with thischemistry.

State-of-the-art dye-terminator chemistry uses four energy transferfluorescent dyes (Rosenblum et al., 1997). These terminators include afluorescein donor dye (6-FAM) linked to one of four differentdichlororhodamine (dRhodamine) acceptor dyes. The d-Rhodamine acceptordyes associated with the terminators are dichloro[R110], dichloro[R6G],dichloro[TAMRA] or dichloro[ROX], for the G-, A-, T- or C-terminators,respectively. The donor dye (6-FAM) efficiently absorbs energy from theargon ion laser in the automated sequencing machine and transfers thatenergy to the linked acceptor dye. The linker connecting the donor andacceptor portions of the terminator is optimally spaced to achieveessentially 100% efficient energy transfer. The fluorescence signalsemitted from these acceptor dyes exhibit minimal spectral overlap andare collected by an ABI PRISM 377 DNA sequencer using 10 nm virtualfilters centered at 540, 570, 595 and 625 nm, for G-, A-, T- orC-terminators, respectively. Thus, energy transfer dye-labeledterminators produce brighter signals and improve spectral resolution.These improvements result in more accurate DNA sequence information.

The predominant enzyme used in automated DNA sequencing reactions is agenetically engineered form of DNA polymerase I from Thermus aquaticus.This enzyme, AmpliTaq DNA Polymerase, FS, was optimized to moreefficiently incorporate ddNTPs and to eliminate the 3′ to 5′ and 5′ to3′ exonuclease activities. Replacing a naturally occurring phenylalanineat position 667 in T. aquaticus DNA polymerase with a tyrosine reducedthe preferential incorporation of a dNTP, relative to a ddNTP (Tabor andRichardson, 1995; Reeve and Fuller, 1995). Thus, a single hydroxyl groupwithin the polymerase is responsible for discrimination between dNTPsand ddNTPs. The 3′ to 5′ exonuclease activity, which enables thepolymerase to remove a mis-incorporated base from the newly replicatedDNA strand (proofreading activity), was eliminated because it alsoallows the polymerase to remove an incorporated ddNTP. The 5′ to 3′exonuclease activity was eliminated because it removes bases from the 5′end of the reaction products. Since the reaction products are sizeseparated during gel electrophoresis, interpretable sequence data isonly obtained if the reaction products share a common endpoint. Morespecifically, the primer defines the 5′ end of the extension product andthe incorporated, color-coded ddNTP defines base identity at the 3′ endof the molecule. Thus, conventional DNA sequencing involves analysis ofa population of DNA molecules sharing the same 5′ endpoint, butdiffering in the location of the ddNTP at the 3′ end of the DNA chain.

Genome Sequencing

Very often a researcher needs to determine the sequence of a DNAfragment that is larger than the 500-1,000 base average sequencing readlength. Not surprisingly, strategies to accomplish this have beendeveloped. These strategies are divided into two major classes, randomor directed, and strategy choice is influenced by the size of thefragment to be sequenced.

In random or shotgun DNA sequencing, a large DNA fragment (typically onelarger than 20,000 base pairs) is broken into smaller fragments that areinserted into a cloning vector. It is assumed that the sum ofinformation contained within these smaller clones is equivalent to thatcontained within the original DNA fragment. Numerous smaller clones arerandomly selected, DNA templates are prepared for sequencing reactions,and primers that will base-pair with the vector DNA sequence borderingthe insert are used to begin the sequencing reaction (2-7 days for a 20kbp insert). Subsequently, the quality of each base call is examined(manually or automatically via software (PHRED, Ewing et al., 1998);1-10 minutes per sequence reaction), and the sequence of the originalDNA fragment is reconstructed by computer assembly of the sequencesobtained from the smaller DNA fragments. Based on the time estimatesprovided, if a shotgun sequencing strategy is used, a 20 kbp insert isexpected to be completed in 3-10 days. This strategy was extensivelyused to determine the sequence of ordered fragments that represent theentire human genome (see the United States Government website nhgri.nih,the HGP sublink at http://www.nhgri.nih.gov/HGP/). However, this randomapproach is typically not sufficient to complete sequence determination,since gaps in the sequence often remain after computer assembly. Adirected strategy (described below) is usually used to complete thesequence project.

A directed or primer-walking sequencing strategy can be used to fill-ingaps remaining after the random phase of large-fragment sequencing, andas an efficient approach for sequencing smaller DNA fragments. Thisstrategy uses DNA primers that anneal to the template at a single siteand act as a start site for chain elongation. This approach requiresknowledge of some sequence information to design the primer. Thesequence obtained from the first reaction is used to design the primerfor the next reaction and these steps are repeated until the completesequence is determined. Thus, a primer-based strategy involves repeatedsequencing steps from known into unknown DNA regions, the processminimizes redundancy, and it does not require additional cloning steps.However, this strategy requires the synthesis of a new primer for eachround of sequencing.

The necessity of designing and synthesizing new primers, coupled withthe expense and the time required for their synthesis, has limited theroutine application of primer-walking for sequencing large DNAfragments. Researchers have proposed using a library of short primers toeliminate the requirement for custom primer synthesis (Studier, 1989;Siemieniak and Slightom, 1990; Kieleczawa et al., 1992; Kotler et al.,1993; Burbelo and Iadarola, 1994; Hardin et al., 1996; Raja et al.,1997; Jones and Hardin, 1998a, b; Ball et al., 1998; Mei and Hardin,2000; Kraltcheva and Hardin, 2001). The availability of a primer libraryminimizes primer waste, since each primer is used to prime multiplereactions, and allows immediate access to the next sequencing primer.

One of the original goals of the Human Genome Project was to completesequence determination of the entire human genome by 2005 (see theUnited States Government website nhgri.nih, the HGP sublink athttp://www.nhgri.nih.gov/HGP/). However, the plan is ahead of scheduleand a ‘working draft’ of the human genome was published in February 2001(Venter er al., 2001, “International Human Genome Sequencing Consortium2001”). Due to technological advances in several disciplines, thecompleted genome sequence is expected in 2003, two years ahead ofschedule. Progress in all aspects involving DNA manipulation (especiallymanipulation and propagation of large DNA fragments), evolution offaster and better DNA sequencing methods (see the abrg.org athttp://www.abrf.org), development of computer hardware and softwarecapable of manipulating and analyzing the data (bioinformatics), andautomation of procedures associated with generating and analyzing DNAsequences (engineering) are responsible for this accelerated time frame.

Single-Molecule DNA Sequencing

Conventional DNA sequencing strategies and methods are reliable, buttime, labor, and cost intensive. To address these issues, someresearchers are investigating fluorescence-based, single-moleculesequencing methods that use enzymatic degradation, followed bysingle-dNMP detection and identification. The DNA polymer containingfluorescently-labeled nucleotides is digested by an exonuclease, and thelabeled nucleotides are detected and identified by flow cytometry (Daviset al., 1991; Davis et al., 1992; Goodwin et al., 1997; Keller et al.,1996; Sauer et al., 1999; Werner et al., 1999). This method requiresthat the DNA strand is synthesized to contain the fluorescently-labeledbase(s). This requirement limits the length of sequence that can bedetermined, and increases the number of manipulations that must beperformed before any sequence data is obtained. A related approachproposes to sequentially separate single (unlabeled) nucleotides from astrand of DNA, confine them in their original order in a solid matrix,and detect the spectroscopic emission of the separated nucleotides toreconstruct DNA sequence information (Ulmer, 1997; Mitsis and Kwagh,1999; Dapprich, 1999). This is the approach that is being developed byPraelux, Inc., a company with a goal to develop single-molecule DNAsequencing. Theoretically, this latter method should not be assusceptible to length limitations as the former enzymatic degradationmethod, but it does require numerous manipulations before any sequenceinformation can be obtained.

Li-cor, Inc. is developing an enzyme synthesis based strategy forsingle-molecule sequencing as set forth in PCT application WO 00/36151.The Li-cor method involves multiply modifying each dNTP by attaching afluorescent tag to the γ-phosphate and a quenching moiety to anothersite on the dNTP, preferably on the base. The quenching moiety is addedto prevent emission from the fluorescent tag attached to anunincorporated dNTP. Upon incorporation the fluorescent tag andquenching moiety are separated, resulting in emission from the tag. Thetag (contained on the pyrophosphate) flows away from the polymeraseactive site, but the modified (quenched) base becomes part of the DNApolymer.

Although some single-molecular sequencing systems have been disclosed,many of them anticipate or require base modification. See, e.g., PatentApplication Serial Numbers WO 01/16375 A2, WO 01/23610A2, WO 01/25480,WO 00/06770, WO 99/05315, WO00/60114, WO 00/36151, WO 00/36512, and WO00/70073, incorporated herein by reference. Base modifications maydistort DNA structure (which normally consists of A-form DNA nearest theenzyme active site; Li et al., 1998a). Since the dNTP and approximately7 of the 3′-nearest bases in the newly synthesized strand contactinternal regions of the polymerase (Li et al., 1998a), the A-form DNAmay be important for maximizing minor groove contacts between the enzymeand the DNA. If the DNA structure is affected due to base modification,enzyme fidelity and/or function may be altered. Thus, there is still aneed in the art for a fast and efficient enzymatic DNA sequencing systemfor single molecular DNA sequences.

SUMMARY OF THE INVENTION Single-Molecule Sequencing

The present invention provides a polymerizing agent modified with atleast one molecular or atomic tag located at or near, associated with orcovalently bonded to a site on the polymerizing agent, where adetectable property of the tag undergoes a change before, during and/orafter monomer incorporation. The monomers can be organic, inorganic orbio-organic monomers such as nucleotides for DNA, RNA, mixed DNA/RNAsequences, amino acids, monosaccharides, synthetic analogs of naturallyoccurring nucleotides, synthetic analogs of naturally occurring aminoacids or synthetic analogs of naturally occurring monosaccharides,synthetic organic or inorganic monomers, or the like.

The present invention provides a depolymerizing agent modified with atleast one molecular or atomic tag located at or near, associated with orcovalently bonded to a site on the depolymerizing agent, where adetectable property of the tag undergoes a change before, during and/orafter monomer removal. The polymers can be DNA, RNA, mixed DNA/RNAsequences containing only naturally occurring nucleotides or a mixtureof naturally occurring nucleotides and synthetic analogs thereof,polypeptide sequences containing only naturally occurring amino acids ora mixture of naturally occurring amino acids and synthetic analogsthereof, polysaccharide or carbohydrate sequences containing onlynaturally occurring monosaccharides or a mixture of naturally occurringmonosaccharides and synthetic analogs thereof, or polymers containingsynthetic organic or inorganic monomers, or the like.

The present invention also provides a system that enables detecting asignal corresponding to a detectable property evidencing changes ininteractions between a synthesizing/polymerizing agent or adepolymerizing agent (molecule) and its substrates (monomers ordepolymerizable polymers) and decoding the signal into monomer orderspecific information or monomer sequence information, preferably inreal-time or near real-time.

Single Site Tagged Polymerase

The present invention provides a polymerase modified with at least onemolecular or atomic tag located at or near, associated with, orcovalently bonded to a site on the polymerase, where a detectableproperty of the tag undergoes a change before, during and/or aftermonomer incorporation. The monomers can be nucleotides for DNA, RNA ormixed DNA/RNA monomers or synthetic analogs polymerizable by thepolymerase.

The present invention provides an exonuclease modified with at least onemolecular or atomic tag located at or near, associated with, orcovalently bonded to a site on the exonuclease, where a detectableproperty of the tag undergoes a change before, during and/or aftermonomer release. The polymers can be DNA, RNA or mixed DNA/RNA sequencescomprised of naturally occurring monomers or synthetic analogsdepolymerizable by the exonuclease.

The present invention provides a polymerase modified with at least onemolecular or atomic tag located at or near, associated with, orcovalently bonded to a site that undergoes a conformational changebefore, during and/or after monomer incorporation, where the tag has afirst detection propensity when the polymerase is in a firstconformational state and a second detection propensity when thepolymerase is in a second conformational state.

The present invention provides a polymerase modified with at least onechromophore located at or near, associated with, or covalently bonded toa site that undergoes a conformational change before, during and/orafter monomer incorporation, where an intensity and/or frequency ofemitted light of the chromophore has a first value when the polymeraseis in a first conformational state and a second value when thepolymerase is in a second conformational state.

The present invention provides a polymerase modified with at least onefluorescently active molecular tag located at or near, associated with,or covalently bonded to a site that undergoes a conformational changebefore, during and/or after monomer incorporation, where the tag has afirst fluorescence propensity when the polymerase is in a firstconformational state and a second fluorescence propensity when thepolymerase is in a second conformational state.

The present invention provides a polymerase modified with a moleculartag located at or near, associated with, or covalently bonded to a sitethat undergoes a conformational change before, during and/or aftermonomer incorporation, where the tag is substantially detectable whenthe polymerase is in a first conformational state and substantiallynon-detectable when the polymerase is in a second conformational stateor substantially non-detectable when the polymerase is in the firstconformational state and substantially detectable when the polymerase isin the second conformational state.

The present invention provides a polymerase modified with at least onemolecular or atomic tag located at or near, associated with, orcovalently bonded to a site that interacts with a tag on the releasedpyrophosphate group, where the polymerase tag has a first detectionpropensity before interacting with the tag on the released pyrophosphategroup and a second detection propensity when interacting with the tag onthe released pyrophosphate group. In a preferred embodiment, this changein detection propensity is cyclical occurring as each pyrophosphategroup is released.

The present invention provides a polymerase modified with at least onechromophore located at or near, associated with, or covalently bonded toa site that interacts with a tag on the released pyrophosphate group,where an intensity and/or frequency of light emitted by the chromophorehas a first value before the chromophore interacts with the tag on thereleased pyrophosphate and a second value when interacting with the tagon the released pyrophosphate group. In a preferred embodiment, thischange in detection propensity is cyclical occurring as eachpyrophosphate group is released.

The present invention provides a polymerase modified with at least onefluorescently active molecular tag located at or near, associated with,or covalently bonded to a site that interacts with a tag on the releasedpyrophosphate group, where the polymerase tag changes from a first stateprior to release of the pyrophosphate group and a second state as thepyrophosphate group diffuses away from the site of release. In apreferred embodiment, this change in detection propensity is cyclicaloccurring as each pyrophosphate group is released.

The present invention provides a polymerase modified with a moleculartag located at or near, associated with, or covalently bonded to a sitethat interacts with a tag on the released pyrophosphate group, where thepolymerase tag changes from a substantially detectable state prior topyrophosphate release to a substantially non-detectable state when thepolymerase tag interacts with the tag on the pyrophosphate group aftergroup release, or changes from a substantially non-detectable stateprior to pyrophosphate release to a substantially detectable state whenthe polymerase tag interacts with the tag on the pyrophosphate groupafter group release.

Multiple Site Tagged Polymerizing or Depolymerizing Agents

The present invention provides a monomer polymerizing agent modifiedwith at least one pair of molecular and/or atomic tags located at ornear, associated with, or covalently bonded to sites on the polymerizingagent, where a detectable property of at least one tag of the pairundergoes a change before, during and/or after monomer incorporation orwhere a detectable property of at least one tag of the pair undergoes achange before, during and/or after monomer incorporation due to a changein inter-tag interaction.

The present invention provides a depolymerizing agent modified with atleast one pair of molecular and/or atomic tags located at or near,associated with, or covalently bonded to sites on the depolymerizingagent, where a detectable property of at least one tag of the pairundergoes a change before, during and/or after monomer release or wherea detectable property of at least one tag of the pair undergoes a changebefore, during and/or after monomer release due to a change in inter-taginteraction.

The present invention provides a monomer polymerizing agent modifiedwith at least one pair of molecular and/or atomic tags located at ornear, associated with, or covalently bonded to sites on the polymerizingagent, where a detectable property of at least one tag of the pair has afirst value when the polymerizing agent is in a first state and a secondvalue when the polymerizing agent is in a second state, where thepolymerizing agent changes from the first state to the second state andback to the first state during a monomer incorporation cycle.

The present invention provides a depolymerizing agent modified with atleast one pair of molecular and/or atomic tags located at or near,associated with or covalently bonded to sites on the polymerizing agent,where a detectable property of at least one tag of the pair has a firstvalue when the depolymerizing agent is in a first state and a secondvalue when the depolymerizing agent is in a second state, where thedepolymerizing agent changes from the first state to the second stateand back to the first state during a monomer release cycle.

Preferably, the first and second states are different so that a changein the detected signal occurs. However, a no-change result may evidenceother properties of the polymerizing media or depolymerizing media.

Multiple Site Tagged Polymerase

The present invention provides a polymerase modified with at least onepair of molecular tags located at or near, associated with, orcovalently bonded to sites at least one of the tags undergoes a changeduring monomer incorporation, where a detectable property of the pairhas a first value when the polymerase is in a first state and a secondvalue when the polymerase is in a second state, where the polymerasechanges from the first state to the second state and back to the firststate during a monomer incorporation cycle.

The present invention provides a polymerase modified with at least onepair of molecular tags located at or near, associated with or covalentlybonded to sites at least one of the tags undergoes conformational changeduring monomer incorporation, where the detectably property of the pairhas a first value when the polymerase is in a first conformational stateand a second value when the polymerase is in a second conformationalstate, where the polymerase changes from the first state to the secondstate and back to the first state during a monomer incorporation cycle.

The present invention provides a polymerase modified with at least onepair of molecules or atoms located at or near, associated with orcovalently bonded to sites at least one of the tags undergoesconformational change during monomer incorporation, where the pairinteract to form a chromophore when the polymerase is in a firstconformational state or a second conformational state, where thepolymerase changes from the first state to the second state and back tothe first state during a monomer incorporation cycle.

The present invention provides a polymerase modified with at least onepair of molecular tags located at or near, associated with or covalentlybonded to sites at least one of the tags undergoes conformational changeduring monomer incorporation, where the tags have a first fluorescencepropensity when the polymerase is in a first conformational state and asecond fluorescence propensity when the polymerase is in a secondconformational state, where the polymerase changes from the first stateto the second state and back to the first state during a monomerincorporation cycle.

The present invention provides a polymerase modified with at least onepair of molecular tags located at or near, associated with or covalentlybonded to sites at least one of the tags undergoes conformational changeduring monomer incorporation, where the pair is substantially activewhen the polymerase is in a first conformational state and substantiallyinactive when the polymerase is in a second conformational state orsubstantially inactive when the polymerase is in the firstconformational state and substantially active when the polymerase is inthe second conformational state, where the polymerase changes from thefirst state to the second state and back to the first state during amonomer incorporation cycle.

The present invention provides a polymerase modified with at least onepair of molecular tags located at or near, associated with, orcovalently bonded to sites at least one of the tags undergoes a changeduring and/or after pyrophosphate release during the monomerincorporation process, where a detectable property of the pair has afirst value when the tag is in a first state prior to pyrophosphaterelease and a second value when the tag is in a second state duringand/or after pyrophosphate release, where the tag changes from its firststate to its second state and back to its first state during a monomerincorporation cycle.

The present invention provides a polymerase modified with at least onepair of molecular tags located at or near, associated with or covalentlybonded to sites at least one of the tags undergoes a change in positiondue to a conformational change in the polymerase during thepyrophosphate release process, where the detectably property of the pairhas a first value when the tag is in its first position and a secondvalue when the tag is in its second position, where the tag changes fromits first position to its second position and back to its first positionduring a release cycle.

The present invention provides a polymerase modified with at least onepair of molecules or atoms located at or near, associated with orcovalently bonded to sites, where the tags change relative separationdue to a conformational change in the polymerase during pyrophosphaterelease, where the tags interact to form a chromophore having a firstemission profile when the tags are a first distance apart and a secondprofile when the tags are a second distance apart, where the separationdistance changes from its first state to its second state and back toits first state during a pyrophosphate release cycle.

The present invention provides a polymerase modified with at least onepair of molecular tags located at or near, associated with or covalentlybonded to sites, where the tags change relative separation due to aconformational change in the polymerase during pyrophosphate release,where the tags have a first fluorescence propensity when the polymeraseis in a first conformational state and a second fluorescence propensitywhen the polymerase is in a second conformational state, where thepropensity changes from its the first value to its second value and backagain during a pyrophosphate release cycle.

The present invention provides a polymerase modified with at least onepair of molecular tags located at or near, associated with or covalentlybonded to sites, where the tags change relative separation due to aconformational change in the polymerase during pyrophosphate release,where the pair is substantially fluorescently active when the tags havea first separation and substantially fluorescently inactive when thetags have a second separation or substantially fluorescently inactivewhen the tags have the first separation and substantially fluorescentlyactive when the tags have the second separation, where the fluorescenceactivity undergoes one cycle during a pyrophosphate release cycle.

It should be recognized that when a property changes from a first stateto a second state and back again, then the property undergoes a cycle.Preferably, the first and second states are different so that a changein the detected signal occurs. However, a no-change result may evidenceother properties of the polymerizing medium or depolymerizing medium.

Methods Using Tagged Polymerizing Agent

The present invention provides a method for determining when a monomeris incorporated into a growing molecular chain comprising the steps ofmonitoring a detectable property of an atomic or molecular tag, wherethe tag is located at or near, associated with, or covalently bonded toa site on a polymerizing agent, where the detectable property of the tagundergoes a change before, during and/or after monomer incorporation.

The present invention provides a method for determining when a monomeris incorporated into a growing molecular chain comprising the steps ofmonitoring a detectable property of an atomic or molecular tag, wherethe tag is located at or near, associated with, or covalently bonded toa site on a polymerizing agent, where the detectable property has afirst value when the agent is in a first state and a second value whenthe agent is in a second state, where the agent changes from the firststate to the second state and back to the first state during a monomerincorporation cycle.

Preferably, the first and second states are different so that a changein the detected signal occurs. However, a no-change result may evidenceother properties of the polymerizing medium.

Methods Using Tagged Polymerase

The present invention provides a method for determining when or whethera monomer is incorporated into a growing molecular chain comprising thesteps of monitoring a detectable property of a tag, where the tag islocated at or near, associated with, or covalently bonded to a site on apolymerase, where the site undergoes a change during monomerincorporation and where the detectable property has a first value whenthe polymerase is in a first state and a second value when thepolymerase is in a second state, where the values signify that the sitehas undergone the change and where the polymerase changes from the firststate to the second state and back to the first state during a monomerincorporation cycle.

The present invention provides a method for determining when or whethera monomer is incorporated into a growing molecular chain comprising thesteps of monitoring a detectable property of a tag, where the tag islocated at or near, associated with, or covalently bonded to a site on apolymerase, where the site undergoes a conformational change duringmonomer incorporation and where the detectable property has a firstvalue when the polymerase is in a first conformational state and asecond value when the polymerase is in a second conformational state,where the values signify that the site has undergone the change andwhere the polymerase changes from the first state to the second stateand back to the first state during a monomer incorporation cycle.

The present invention provides a method for determining when or whethera monomer is incorporated into a growing molecular chain comprising thesteps of exposing a tagged polymerase to light, monitoring an intensityand/or frequency of fluorescent light emitted by the tagged polymerase,where the tagged polymerase comprises a polymerase including a taglocated at or near, associated with, or covalently bonded to a site thatundergoes conformational change during monomer incorporation and wherethe tag emits fluorescent light at a first intensity and/or frequencywhen the polymerase is in a first conformational state and a secondintensity and/or frequency when the polymerase is in a secondconformational state, where the change in intensities and/or frequenciessignifies that the site has undergone the change and where thepolymerase changes from the first state to the second state and back tothe first state during a monomer incorporation cycle.

The present invention also provides the above methods using a pluralityof tagged polymerases permitting parallel and/or massively parallelsequencing simultaneously. Such parallelism can be used to ensureconfidence. Such parallelism can also be used to quickly detect thedegree of homology in DNA sequences for a given gene across species orto quickly screen patient DNA for specific genetic traits or to quicklyscreen DNA sequences for polymorphisms.

The present invention also provides a method for determining if or whena monomer is incorporated into a growing DNA chain associated with apolymerase, where a tag is located on the polymerase so that as thepyrophosphate group is released after base incorporation and prior toits diffusion away from the polymerase, the polymerase tag interactswith the tag on the pyrophosphate causing a change in a detectableproperty of one of the tags or a detectable property associated withboth tags in the case of a fluorescent pair.

Preferably, the first and second states are different so that a changein the detected signal occurs. However, a no-change result may evidenceother properties of the polymerizing media.

Apparatuses Using Tagged Polymerizing Agent

The present invention provides a single-molecule sequencing apparatuscomprising a substrate having deposited thereon at least one taggedpolymerizing agent. The tagged polymerizing agent can be placed on thesurface of the substrate in an appropriate polymerizing medium or thepolymerizing agent can be confined in a region, area, well, groove,channel or other similar structure on the substrate. The substrate canalso include a monomer region, area, well, groove, channel, reservoir orother similar structure on the substrate connected to the polymerizingagent confinement structure by at least one connecting structure capableof supporting molecular transport of monomer to the polymerizing agentsuch as a channel, groove, or the like. Alternatively, the substrate caninclude structures containing each monomer, where each structure isconnected to the polymerizing agent confinement structure by aconnecting structure capable of supporting molecular transport ofmonomer to the polymerizing agent. The substrate can also be subdividedinto a plurality of polymerizing agent confinement structures, whereeach structure is connected to a monomer reservoir. Alternatively, eachpolymerizing agent confinement structure can have its own monomerreservoir or sufficient monomer reservoirs so that each reservoircontains a specific monomer.

The present invention also provides a single-molecule sequencingapparatus comprising a substrate having at least one tagged polymerizingagent attached to the surface of the substrate by a molecular tether orlinking group, where one end of the tether or linking group is bonded toa site on the surface of the substrate and the other end is bonded to asite on the polymerizing agent or bonded to a site on a moleculestrongly associated with the polymerizing agent. In this context, theterm “bonded to” means that chemical and/or physical interactionssufficient to maintain the polymerizing agent within a given region ofthe substrate under normal polymerizing conditions. The chemical and/orphysical interactions include, without limitation, covalent bonding,ionic bonding, hydrogen bonding, apolar bonding, attractiveelectrostatic interactions, dipole interactions, or any other electricalor quantum mechanical interaction sufficient in toto to maintain thepolymerizing agent in a desired region of the substrate. The substratehaving tethered tagged polymerizing agent attached thereon can be placedin container containing an appropriate polymerizing medium.Alternatively, the tagged polymerizing agent can be tethered or anchoredon or within a region, area, well, groove, channel or other similarstructure on the substrate capable of being filled with an appropriatepolymerizing medium. The substrate can also include a monomer region,area, well, groove, channel or other similar structure on the substrateconnected to the polymerizing agent structure by at least one aconnecting structure capable of supporting molecular transports ofmonomer to the polymerizing agent. Alternatively, the substrate caninclude structures containing each monomer, where each structure isconnected to the polymerizing agent structure by a connecting structurecapable of supporting molecular transports of monomer to thepolymerizing agent. The substrate can also be subdivided into aplurality of polymerizing agent structures each having at least onetethered polymerizing agent, where each structure is connected to amonomer reservoir. Alternatively, each polymerizing agent structure canhave its own monomer reservoir or sufficient monomer reservoirs, onereservoir of each specific monomer.

The monomers for use in these apparatus including, without limitation,dNTPs, tagged dNTPs, ddNTPs, tagged ddNTPs, amino acids, tagged aminoacids, mono saccharides, tagged monosaccharides or appropriate mixturesor combinations thereof depending on the type of polymer beingsequenced.

Apparatus Using Tagged Polymerase

The present invention provides a single-molecule sequencing apparatuscomprising a substrate having deposited thereon at least one taggedpolymerase. The tagged polymerase can be placed on the surface of thesubstrate in an appropriate polymerizing medium or the polymerase can beconfined in a region, area, well, groove, channel or other similarstructure on the substrate capable of being filled with an appropriatepolymerizing medium. The substrate can also include a monomer region,area, well, groove, channel or other similar structure on the substrateconnected to the polymerase confinement structure by at least oneconnecting structure capable of supporting molecular transports ofmonomer to the polymerase. Alternatively, the substrate can includestructures containing each monomer, where each structure is connected tothe polymerase confinement structure by a connecting structure capableof supporting molecular transports of the monomer to the polymerase inthe polymerase confinement structures. The substrate can also besubdivided into a plurality of polymerase confinement structures, whereeach structure is connected to a monomer reservoir. Alternatively, eachpolymerase confinement structure can have its own monomer reservoir orfour reservoirs, each reservoir containing a specific monomer.

The present invention also provides a single-molecule sequencingapparatus comprising a substrate having at least one tagged polymeraseattached to the surface of the substrate by a molecular tether orlinking group, where one end of the tether or linking group is bonded toa site on the surface of the substrate and the other end is bonded(either directly or indirectly) to a site on the polymerase or bonded toa site on a molecule strongly associated with the polymerase. In thiscontext, the term “bonded to” means that chemical and/or physicalinteractions sufficient to maintain the polymerase within a given regionof the substrate under normal polymerizing conditions. The chemicaland/or physical interactions include, without limitation, covalentbonding, ionic bonding, hydrogen bonding, apolar bonding, attractiveelectrostatic interactions, dipole interactions, or any other electricalor quantum mechanical interaction sufficient in toto to maintain thepolymerase in its desired region. The substrate having tethered taggedpolymerizing agent attached thereon can be placed in containercontaining an appropriate polymerizing medium. Alternatively, the taggedpolymerizing agent can be tethered or anchored on or within a region,area, well, groove, channel or other similar structure on the substratecapable of being filled with an appropriate polymerizing medium. Thesubstrate can also include a monomer region, area, well, groove, channelor other similar structure on the substrate connected to the polymerasestructure by at least one channel. Alternatively, the substrate caninclude structures containing each monomer, where each structure isconnected to the polymerase structure by a connecting structure thatsupports molecular transports of the monomer to the polymerase in thepolymerase confinement structures. The substrate can also be subdividedinto a plurality of polymerase structures each having at least onetethered polymerase, where each structure is connected to a monomerreservoir. Alternatively, each polymerase structure can have its ownmonomer reservoir or four reservoirs, each reservoir containing aspecific monomer.

The monomers for use in these apparatus including, without limitation,dNTPs, tagged dNTPs, ddNTPs, tagged ddNTPs, or mixtures or combinationsthereof.

Methods Using the Single-Molecule Sequencing Apparatuses

The present invention provides a method for single-molecule sequencingcomprising the step of supplying a plurality of monomers to a taggedpolymerizing agent confined on or tethered to a substrate and monitoringa detectable property of the tag over time. The method can also includea step of relating changes in the detectable property to the occurrence(timing) of monomer addition and/or to the identity of each incorporatedmonomer and/or to the near simultaneous determination of the sequence ofincorporated monomers.

The present invention provides a method for single-molecule sequencingcomprising the step of supplying a plurality of monomers to a taggedpolymerizing agent confined on or tethered to a substrate, exposing thetagged polymerizing agent to light either continuously or periodicallyand measuring an intensity and/or frequency of fluorescent light emittedby the tag over time. The method can further comprise relating thechanges in the measured intensity and/or frequency of emittedfluorescent light from the tag over time to the occurrence (timing) ofmonomer addition and/or to the identity of each incorporated monomerand/or to the near simultaneous determination of the sequence of theincorporated monomers.

The present invention provides a method for single-molecule sequencingcomprising the step of supplying a plurality of monomers to a taggedpolymerase confined on or tethered to a substrate and monitoring adetectable property of the tag over time. The method can also include astep of relating changes in the detectable property over time to theoccurrence (timing) of monomer addition and/or to the identity of eachincorporated monomer and/or to the near simultaneous determination ofthe sequence of the incorporated monomers.

The present invention provides a method for single-molecule sequencingcomprising the step of supplying a plurality of monomers to a taggedpolymerase confined on a substrate, exposing the tagged polymerase tolight continuously or periodically and measuring an intensity and/orfrequency of fluorescent light emitted by the tagged polymerase overtime. The method can further comprise relating changes in the measuredintensity and/or frequency of emitted fluorescent light from the tagover time to the occurrence (timing) of monomer addition and/or to theidentity of each incorporated monomer and/or to the near simultaneousdetermination of the sequence of the incorporated monomers.

Cooperatively Tagged Systems

The present invention provides cooperatively tagged polymerizing agentsand tagged monomers, where a detectable property of at least one of thetags changes when the tags interact before, during and/or after monomerinsertion. In one preferred embodiment, the tag on the polymerase ispositioned such that the tags interact before, during and/or after eachmonomer insertion. In the of case tags that are released from themonomers after monomer insert such as of β and/or γ phosphate taggeddNTPs, i.e., the tags reside on the β and/or γ phosphate groups, the tagon the polymerizing agent can be designed to interact with the tag onthe monomer only after the tag is released from the polymerizing agentafter monomer insertion. Tag placement within a polymerizing agent canbe optimized to enhance interaction between the polymerase and dNTP tagsby attaching the polymerase tag to sites on the polymerase that moveduring an incorporation event changing the relative separation of thetwo tags or optimized to enhance interaction between the polymerase tagand the tag on the pyrophosphate as it is release during baseincorporation and prior to its diffusion away from the polymerizingagent.

The present invention provides cooperatively tagged polymerizing agentsand tagged monomers, where a detectable property of at least one of thetags changes when the tags are within a distance sufficient to cause ameasurable change in the detectable property. If the detectable propertyis fluorescence induced in one tag by energy transfer to the other tagor due to one tag quenching the fluorescence of the other tag or causinga measurable change in the fluorescence intensity and/or frequency, themeasurable change is caused by bringing the tags into close proximity toeach other, i.e., decrease the distance separating the tags. Generally,the distance needed to cause a measurable change in the detectableproperty is within (less than or equal to) about 100 Å, preferablywithin about 50 Å, particularly within about 25 Å, especially withinabout 15 Å and most preferably within about 10 Å. Of course, one skilledin the art will recognize that a distance sufficient to cause ameasurable change in a detectable property of a tag will depend on manyparameters including the location of the tag, the nature of the tag, thesolvent system, external fields, excitation source intensity andfrequency band width, temperature, pressure, etc.

The present invention provides a tagged polymerizing agent and taggedmonomer precursor(s), where an intensity and/or frequency offluorescence light emitted by at least one tag changes when the tagsinteract before, during and/or after monomer insertion.

The present invention provides cooperatively tagged depolymerizingagents and tagged depolymerizable polymer, where a detectable propertyof at least one of the tags changes when the tags interact before,during and/or after monomer release. The tag on the depolymerizing agentcan be designed so that the tags interact before, during and/or aftereach monomer release.

The present invention provides cooperatively tagged depolymerizingagents and tagged polymers, where a detectable property of at least oneof the tags changes when the tags are within a distance sufficient tocause a change in measurable change in the detectable property. If thedetectable property is fluorescence induced in one tag by energytransfer to the other tag or due to one tag quenching the fluorescenceof the other tag or causing a measurable change in the fluorescenceintensity and/or frequency, the measurable change is caused by bringingtwo tags into close proximity to each other, i.e., decrease the distanceseparating the tags. Generally, the distance needed to cause ameasurable change in the detectable property is within (less than orequal to) about 100 Å preferably within about 50 Å, particularly withinabout 25 Å, especially within about 15 Å and most preferably withinabout 10 Å. Of course, one skilled in the art will recognize that adistance sufficient to cause a measurable change in a detectableproperty of a tag will depend on many parameters including the locationof the tag, the nature of the tag, the solvent system, external fields,excitation source intensity and frequency band width, temperature,pressure, etc.

The present invention provides a tagged depolymerizing agents and atagged polymer, where an intensity and/or frequency of fluorescencelight emitted by at least one tag changes when the tags interact before,during and/or after monomer release.

Cooperatively Tagged Systems Using a Polymerase

The present invention provides cooperatively tagged polymerase andtagged monomers, where a detectable property of at least one of the tagschanges when the tags interact before, during and/or after monomerinsertion. The tag on the polymerase can be designed so that the tagsinteract before, during and/or after each monomer insertion. In the ofcase tags that are released from the monomers after monomer insert suchas of β and/or γ phosphate tagged dNTPs, i.e., the tags reside on the βand/or γ phosphate groups, the tag on the polymerizing agent can bedesigned to interact with the tag on the monomer only after the tag isreleased from the polymerizing agent after monomer insertion. In thefirst case, the polymerase tag must be located on a site of thepolymerase which allows the polymerase tag to interact with the monomertag during the monomer insertion process—initial binding and bondinginto the growing polymer. While in the second case, the polymerase tagmust be located on a site of the polymerase which allows the polymerasetag to interact with the monomer tag now on the released pyrophosphateprior to its diffusion away from the polymerase and into thepolymerizing medium.

The present invention provides cooperatively tagged polymerase andtagged monomers, where a detectable property of at least one of the tagschanges when the tags are within a distance sufficient or in closeproximity to cause a measurable change in the detectable property. Ifthe detectable property is fluorescence induced in one tag by energytransfer to the other tag or due to one tag quenching the fluorescenceof the other tag or causing a measurable change in the fluorescenceintensity and/or frequency, the measurable change is caused by bringingtwo tags into close proximity to each other, i.e., decrease the distanceseparating the tags. Generally, the distance or close proximity is adistance between about 100 Å and about 10 Å. Alternatively, the distanceis less than or equal to about 100 Å, preferably less than or equal toabout 50 Å, particularly less than or equal to about 25 Å, especiallyless than or equal to about 15A and most preferably less than or equalto about 10 Å. Of course, one skilled in the art will recognize that adistance sufficient to cause a measurable change in a detectableproperty of a tag will depend on many parameters including the locationof the tags, the nature of the tags, the solvent system (polymerizingmedium), external fields, excitation source intensity and frequency bandwidth, temperature, pressure, etc.

The present invention provides a tagged polymerase and tagged monomerprecursors, where the tags form a fluorescently active pair such as adonor-acceptor pair and an intensity and/or frequency of fluorescencelight emitted by at least one tag (generally the acceptor tag indonor-acceptor pairs) changes when the tags interact.

The present invention provides a tagged polymerase and a tagged monomerprecursors, where the tags form a fluorescently active pair such as adonor-acceptor pair and an intensity and/or frequency of fluorescencelight emitted by at least one tag (generally the acceptor tag indonor-acceptor pairs) changes when the tags are a distance sufficient orin close proximity to change either the intensity and/or frequency ofthe fluorescent light. Generally, the distance or close proximity is adistance between about 100 Å and about 10 Å. Alternatively, the distanceis less than or equal to about 100 Å, preferably less than or equal toabout 50 Å, particularly less than or equal to about 25 Å, especiallyless than or equal to about 15 Å and most preferably less than or equalto about 10 Å. Of course, one skilled in the art will recognize that adistance sufficient to cause a measurable change in a detectableproperty of a tag will depend on many parameters including the locationof the tag, the nature of the tag, the solvent system, external fields,excitation source intensity and frequency band width, temperature,pressure, etc.

The present invention provides a single-molecule sequencing apparatuscomprising a container having at least one tagged polymerase confined onor tethered to an interior surface thereof and having a solutioncontaining a plurality of tagged monomers in contact with the interiorsurface.

Molecular Data Stream Reading Methods and Apparatus

The present invention provides a method for single-molecule sequencingcomprising the step of supplying a plurality of tagged monomers to atagged polymerase confined on an interior surface of a container,exposing the tagged polymerase to light and measuring an intensityand/or frequency of fluorescent light emitted by the tagged polymeraseduring each successive monomer addition or insertion into a growingpolymer chain. The method can further comprise relating the measuredintensity and/or frequency of emitted fluorescent light to incorporationevents and/or to the identification of each inserted or added monomerresulting in a near real-time or real-time readout of the sequence ofthe a growing nucleic acid sequence—DNA sequence, RNA sequence or mixedDNA/RNA sequences.

The present invention also provides a system for retrieving storedinformation comprising a molecule having a sequence of known elementsrepresenting a data stream, a single-molecule sequencer comprising apolymerase having at least one tag associated therewith, an excitationsource adapted to excite at least one tag on the polymerase, and adetector adapted to detect a response from the excited tag on thepolymerase, where the response from the at least one tag changes duringpolymerization of a complementary sequence of elements and the change inresponse represents a content of the data stream.

The present invention also provides a system for determining sequenceinformation from a single-molecule comprising a molecule having asequence of known elements, a single-molecule sequencer comprising apolymerase having at least one tag associated therewith, a excitationsource adapted to excite at least one tag on the polymerase, and adetector adapted to detect a response from the excited tag on thepolymerase, where the response from at least one tag changes duringpolymerization of a complementary sequence of elements representing theelement sequence of the molecule.

The present invention also provides a system for determining sequenceinformation from a single-molecule comprising a molecule having asequence of known elements, a single-molecule sequencer comprising apolymerase having at least one fluorescent tag associated therewith, anexcitation light source adapted to excite at least one fluorescent tagon the polymerase and/or monomer and a fluorescent light detectoradapted to detect at least an intensity of emitted fluorescent lightfrom at least one fluorescent tag on the polymerase and/or monomer,where the signal intensity changes each time a new nucleotide ornucleotide analog is polymerized into a complementary sequence andeither the duration of the emission or lack of emission or thewavelength range of the emitted light evidences the particularnucleotide or nucleotide analog polymerized into the sequence so that atthe completion of the sequencing the data stream is retrieved.

The present invention also provides a system for storing and retrievingdata comprising a sequence of nucleotides or nucleotide analogsrepresenting a given data stream; a single-molecule sequencer comprisinga polymerase having at least one fluorescent tag covalently attachedthereto; an excitation light source adapted to excite the at least onefluorescent tag on the polymerase and/or monomer; and a fluorescentlight detector adapted to detect emitted fluorescent light from at leastone fluorescent tag on the polymerase and/or monomer, where at least onefluorescent tag emits or fails to emit fluorescent light each time a newnucleotide or nucleotide analog is polymerized into a complementarysequence and either the duration of the emission or lack of emission orthe wavelength range of the emitted light evidences the particularnucleotide or nucleotide analog polymerized into the sequence so that atthe completion of the sequencing the data stream is retrieved.

The term monomer as used herein means any compound that can beincorporated into a growing molecular chain by a given polymerase. Suchmonomers include, without limitations, naturally occurring nucleotides(e.g., ATP, GTP, TTP, UTP, CTP, dATP, dGTP, dTTP, dUTP, dCTP, syntheticanalogs), precursors for each nucleotide, non-naturally occurringnucleotides and their precursors or any other molecule that can beincorporated into a growing polymer chain by a given polymerase.Additionally, amino acids (natural or synthetic) for protein or proteinanalog synthesis, mono saccharides for carbohydrate synthesis or othermonomeric syntheses.

The term polymerase as used herein means any molecule or molecularassembly that can polymerize a set of monomers into a polymer having apredetermined sequence of the monomers, including, without limitation,naturally occurring polymerases or reverse transcriptases, mutatednaturally occurring polymerases or reverse transcriptases, where themutation involves the replacement of one or more or many amino acidswith other amino acids, the insertion or deletion of one or more or manyamino acids from the polymerases or reverse transcriptases, or theconjugation of parts of one or more polymerases or reversetranscriptases, non-naturally occurring polymerases or reversetranscriptases. The term polymerase also embraces synthetic molecules ormolecular assembly that can polymerize a polymer having a pre-determinedsequence of monomers, or any other molecule or molecular assembly thatmay have additional sequences that facilitate purification and/orimmobilization and/or molecular interaction of the tags, and that canpolymerize a polymer having a pre-determined or specified or templatedsequence of monomers.

Single Site Tagged Polymerizing or Depolymerizing Agents

The present invention provides a composition comprising a polymerizingagent including at least one molecular and/or atomic tag located at ornear, associated with or covalently bonded to a site on the agent, wherea detectable property of the tag undergoes a change before, duringand/or after monomer incorporation.

The present invention provides a composition comprising a polymerizingagent including at least one molecular and/or atomic tag located at ornear, associated with or covalently bonded to a site on the agent, wherea detectable property has a first value when the polymerase is in afirst state and a second value when the polymerase is in a second stateduring monomer incorporation.

The present invention provides a composition comprising a depolymerizingagent including at least one molecular and/or atomic tag located at ornear, associated with or covalently bonded to a site on the agent, wherea detectable property of the tag undergoes a change before, duringand/or after monomer removal.

The present invention provides a composition comprising a polymerizingagent including at least one molecular and/or atomic tag located at ornear, associated with or covalently bonded to a site on the agent, wherea detectable property has a first value when the polymerase is in afirst state and a second value when the polymerase is in a second stateduring monomer removal.

Single Site Tagged Polymerase

The present invention provides a composition comprising a polymeraseincluding at least one molecular and/or atomic tag located at or near,associated with or covalently bonded to a site on the polymerase, wherea detectable property of the tag undergoes a change before, duringand/or after monomer incorporation.

The present invention provides a composition comprising a polymeraseincluding at least one molecular and/or atomic tag located at or near,associated with or covalently bonded to a site on the polymerase, wherea detectable property has a first value when the polymerase is in afirst state and a second value when the polymerase is in a second stateduring monomer incorporation.

The present invention provides a composition comprising an exonucleaseincluding at least one molecular and/or atomic tag located at or near,associated with or covalently bonded to a site on the agent, where adetectable property of the tag undergoes a change before, during and/orafter monomer removal.

The present invention provides a composition comprising an exonucleaseincluding at least one molecular and/or atomic tag located at or near,associated with or covalently bonded to a site on the agent, where adetectable property has a first value when the polymerase is in a firststate and a second value when the polymerase is in a second state duringmonomer removal.

The present invention provides a composition comprising an enzymemodified to produce a detectable response prior to, during and/or afterinteraction with an appropriately modified monomer, where the monomersare nucleotides, nucleotide analogs, amino acids, amino acid analogs,monosaccharides, monosaccharide analogs or mixtures or combinationsthereof.

The present invention provides a composition comprising a polymeraseincluding at least one molecular tag located at or near, associated withor covalently bonded to a site that undergoes conformational changeduring monomer incorporation, where the tag has a first detectionpropensity when the polymerase is in a first conformational state and asecond detection propensity when the polymerase is in a secondconformational state.

The present invention provides a composition comprising a polymeraseincluding at least one chromophore located at or near, associated withor covalently bonded to a site that undergoes conformational changeduring monomer incorporation, where an intensity and/or frequency ofemitted light of the tag has a first value when the polymerase is in afirst conformational state and a second value when the polymerase is ina second conformational state.

The present invention provides a composition comprising a polymeraseincluding at least one molecular tag located at or near, associated withor covalently bonded to a site that undergoes conformational changeduring monomer incorporation, where the tag has a first fluorescencepropensity when the polymerase is in a first conformational state and asecond fluorescence propensity when the polymerase is in a secondconformational state.

The present invention provides a composition comprising a polymeraseincluding a molecular tag located at or near, associated with orcovalently bonded to a site that undergoes conformational change duringmonomer incorporation, where the tag is substantially active when thepolymerase is in a first conformational state and substantially inactivewhen the polymerase is in a second conformational state or substantiallyinactive when the polymerase is in the first conformational state andsubstantially active when the polymerase is in the second conformationalstate.

Multiple Site Tagged Polymerizing and Depolymerizing Agents

The present invention provides a composition comprising a polymerizingagent including at least one pair of molecular tags located at or near,associated with or covalently bonded to a site of the agent, where adetectable property of at least one of the tags undergoes a changebefore, during and/or after monomer incorporation.

The present invention provides a composition comprising a polymerizingagent including at least one pair of molecular tags located at or near,associated with or covalently bonded to a site of the agent, where adetectable property has a first value when the polymerase is in a firststate and a second value when the polymerase is in a second state duringmonomer incorporation.

The present invention provides a composition comprising a depolymerizingagent including at least one pair of molecular tags located at or near,associated with or covalently bonded to a site of the agent, where adetectable property of at least one of the tags undergoes a changebefore, during and/or after monomer removal.

The present invention provides a composition comprising a depolymerizingagent including at least one pair of molecular tags located at or near,associated with or covalently bonded to a site of the agent, where adetectable property has a first value when the polymerase is in a firststate and a second value when the polymerase is in a second state duringmonomer removal.

Multiple Site Tagged Polymerase

The present invention provides a composition comprising a polymeraseincluding at least one pair of molecular tags located at or near,associated with or covalently bonded to a site of the polymerase, wherea detectable property of at least one of the tags undergoes a changebefore, during and/or after monomer incorporation.

The present invention provides a composition comprising a polymeraseincluding at least one pair of molecular tags located at or near,associated with or covalently bonded to a site of the polymerase, wherea detectable property has a first value when the polymerase is in afirst state and a second value when the polymerase is in a second stateduring monomer incorporation.

The present invention provides a composition comprising an exonucleaseincluding at least one pair of molecular tags located at or near,associated with or covalently bonded to a site of the polymerase, wherea detectable property of at least one of the tags undergoes a changebefore, during and/or after monomer removal.

The present invention provides a composition comprising an exonucleaseincluding at least one pair of molecular tags located at or near,associated with or covalently bonded to a site of the polymerase, wherea detectable property has a first value when the polymerase is in afirst state and a second value when the polymerase is in a second stateduring monomer removal.

The present invention provides a composition comprising a polymeraseincluding at least one pair of molecular tags located at or near,associated with or covalently bonded to a site that undergoesconformational change during monomer incorporation, where the detectableproperty of the pair has a first value when the polymerase is in a firstconformational state and a second value when the polymerase is in asecond conformational state.

The present invention provides a composition comprising a polymeraseincluding at least one pair of molecules or atoms located at or near,associated with or covalently bonded to a site that undergoesconformational change during monomer incorporation, where the pairinteract to form a chromophore when the polymerase is in a firstconformational state or a second conformational state.

The present invention provides a composition comprising a polymeraseincluding at least one pair of molecular tags located at or near,associated with or covalently bonded to a site that undergoesconformational change during monomer incorporation, where the tags havea first fluorescence propensity when the polymerase is in a firstconformational state and a second fluorescence propensity when thepolymerase is in a second conformational state.

The present invention provides a composition comprising a polymeraseincluding at least one pair of molecular tags located at or near,associated with or covalently bonded to a site that undergoesconformational change during monomer incorporation, where the pair issubstantially active when the polymerase is in a first conformationalstate and substantially inactive when the polymerase is in a secondconformational state or substantially inactive when the polymerase is inthe first conformational state and substantially active when thepolymerase is in the second conformational state.

Methods Using Tagged Polymerase

The present invention provides a method for determining when a monomeris incorporated into a growing molecular chain comprising the steps ofmonitoring a detectable property of a tag, where the tag is located ator near, associated with or covalently bonded to a site on a polymeraseor associated with or covalently bonded to a site on the monomer, wherethe site undergoes a change during monomer incorporation and where thedetectable property has a first value when the polymerase is in a firststate and a second value when the polymerase is in a second state andcycles from the first value to the second value during each monomeraddition.

The present invention provides a method for determining when a monomeris incorporated into a growing molecular chain comprising the steps ofmonitoring a detectable property of a tag, where the tag is located ator near, associated with or covalently bonded to a site on a polymeraseor associated with or covalently bonded to a site on the monomer, wherethe site undergoes a conformational change during monomer incorporationand where the detectable property has a first value when the polymeraseis in a first conformational state and a second value when thepolymerase is in a second conformational state and cycles from the firstvalue to the second value during each monomer addition.

The present invention provides a method for determining when a monomeris incorporated into a growing molecular chain comprising the steps ofexposing a tagged polymerase to light, monitoring an intensity and/orfrequency of fluorescent light emitted by the tagged polymerase and/ormonomer, where the tagged polymerase comprises a polymerase including atag located at or near, associated with or covalently bonded to a sitethat undergoes conformational change during monomer incorporation orassociated with or covalently bonded to a site on the monomer and wherethe tag emits fluorescent light at a first intensity and/or frequencywhen the polymerase is in a first conformational state and a secondintensity and/or frequency when the polymerase is in a secondconformational state and cycles from the first value to the second valueduring each monomer addition.

Single-Molecule Sequencing Apparatus Using Tagged Polymerase

The present invention provides a composition comprising asingle-molecule sequencing apparatus comprising a substrate having achamber or chip surface in which at least one tagged polymerase isconfined therein and a plurality of chambers, each of which includes aspecific monomer and a plurality of channels interconnecting thechambers, where each replication complex is sufficiently distant toenable data collection from each complex individually.

The present invention provides a method for single-molecule sequencingcomprising the steps of supplying a plurality of monomers to a taggedpolymerase confined on a substrate, exposing the tagged polymerase tolight and measuring an intensity and/or frequency of fluorescent lightemitted by the tagged polymerase. The method can further comprise thestep of relating the measured intensity and/or frequency of emittedfluorescent light to incorporation of a specific monomer into a growingDNA chain.

Cooperatively Tagged Monomers and Tagged Polymerizing Agent

The present invention provides a composition comprising a cooperativelytagged polymerizing agent and tagged monomers, where a detectableproperty of at least one of the tags changes when the tags interact.

The present invention provides a composition comprising a cooperativelytagged depolymerizing agent and tagged depolymerizable monomers, where adetectable property of at least one of the tags changes when the tagsinteract.

Cooperatively Tagged Monomers and Tagged Polymerase

The present invention provides a composition comprising a cooperativelytagged polymerase and tagged monomers, where a detectable property of atleast one of the tags changes when the tags interact.

The present invention provides a composition comprising a cooperativelytagged polymerase and tagged monomers, where a detectable property of atleast one of the tags changes when the tag are within a distancesufficient to cause a change in the intensity and/or frequency ofemitted fluorescent light.

The present invention provides a composition comprising a taggedpolymerase and tagged monomer precursors, where an intensity and/orfrequency of fluorescence light emitted by at least one tag changes whenthe tags interact.

The present invention provides a composition comprising a taggedpolymerase and a tagged monomer precursors, where an intensity and/orfrequency of fluorescence light emitted by at least one tag changes whenthe tags are within a distance sufficient to cause a change in theintensity and/or frequency of emitted fluorescent light.

The present invention provides a single-molecule sequencing apparatuscomprising a container having at least one tagged polymerase confined onan interior surface thereof and having a solution containing a pluralityof tagged monomers in contact with the interior surface or a subset oftagged monomers and a subset of untagged monomers which together provideall monomers precursor for polymerization.

The present invention provides a method for single-molecule sequencingcomprising the steps of supplying a plurality of tagged monomers to atagged polymerase confined on an interior surface of a container,exposing the tagged polymerase to light and measuring an intensityand/or frequency of fluorescent light emitted by the tagged polymerase.The method can further comprise relating the measured intensity and/orfrequency of emitted fluorescent light to incorporation of a specificmonomer into a growing DNA chain.

The present invention provides a system for retrieving storedinformation comprising: (a) a molecule having a sequence of elementsrepresenting a data stream; (b) a single-molecule sequencer comprising apolymerase having at least one tag associated therewith; (c) anexcitation source adapted to excite the at least one tag on thepolymerase; and (d) a detector adapted to detect a response from the tagon the polymerase or on the monomers; where the response from at leastone tag changes during polymerization of a complementary sequence ofelements and the change in response represents a data stream content.

The present invention provides a system for determining sequenceinformation from a single-molecule comprising: (a) a molecule having asequence of elements; (b) a single-molecule sequencer comprising apolymerase having at least one tag associated therewith; (c) anexcitation source adapted to excite at least one tag on the polymeraseor on the monomers; and (d) a detector adapted to detect a response fromthe tag on the polymerase; where the response from at least one tagchanges during polymerization of a complementary sequence of elementsrepresenting the element sequence of the molecule.

The present invention provides a system for determining sequenceinformation from an individual molecule comprising: (a) a moleculehaving a sequence of elements; (b) a single-molecule sequencercomprising a polymerase having at least one fluorescent tag associatedtherewith; (c) an excitation light source adapted to excite the at leastone fluorescent tag on the polymerase or on the monomers; and (d) afluorescent light detector adapted to detect at least an intensity ofemitted fluorescent light from the at least one fluorescent tag on thepolymerase; where the intensity change of at least one fluorescent tagemits or fails to emit fluorescent light each time a new nucleotide ornucleotide analog is polymerized into a complementary sequence andeither the duration of the emission or lack of emission or thewavelength range of the emitted light evidences the particularnucleotide or nucleotide analog polymerized into the sequence so that atthe completion of the sequencing the data stream is retrieved.

The present invention provides a system for storing and retrieving datacomprising: (a) a sequence of nucleotides or nucleotide analogsrepresenting a given data stream; (b) a single-molecule sequencercomprising a polymerase having at least one fluorescent tag covalentlyattached thereto; (c) an excitation light source adapted to excite atleast one fluorescent tag on the polymerase; and (d) a fluorescent lightdetector adapted to detect emitted fluorescent light from at least onefluorescent tag on the polymerase; where at least one fluorescent tagemits or fails to emit fluorescent light each time a new nucleotide ornucleotide analog is polymerized into a complementary sequence andeither the duration of the emission or lack of emission or thewavelength range of the emitted light evidences the particularnucleotide or nucleotide analog polymerized into the sequence so that atthe completion of the sequencing the data stream is retrieved.

The present invention provides a system for storing and retrieving datacomprising: (a) a sequence of nucleotides or nucleotide analogsrepresenting a given data stream; (b) a single-molecule sequencercomprising a polymerase having at least one fluorescent tag covalentlyattached thereto; (c) an excitation light source adapted to excite theat least one fluorescent tag on the polymerase or the monomers; and (d)a fluorescent light detector adapted to detect emitted fluorescent lightfrom at least one fluorescent tag on the polymerase or the monomers;where at least one fluorescent tag emits or fails to emit fluorescentlight each time a new nucleotide or nucleotide analog is polymerizedinto a complementary sequence and either the duration of the emission orlack of emission or the wavelength range of the emitted light evidencesthe particular nucleotide or nucleotide analog polymerized into thesequence so that at the completion of the sequencing the data stream isretrieved.

The present invention provides a method for sequencing a molecularsequence comprising the steps of: (a) a sequenced of nucleotides ornucleotide analogs representing a given data stream; (b) asingle-molecule sequencer comprising a polymerase having at least onefluorescent tag covalently attached thereto; (c) an excitation lightsource adapted to excite at least one fluorescent tag on the polymeraseor the monomers; and (d) a fluorescent light detector adapted to detectemitted fluorescent light from at least one fluorescent tag on thepolymerase; where at least one fluorescent tag emits or fails to emitfluorescent light each time a new nucleotide or nucleotide analog ispolymerized into a complementary sequence and either the duration of theemission or lack of emission or the wavelength range of the emittedlight evidences the particular nucleotide or nucleotide analogpolymerized into the sequence so that at the completion of thesequencing the data stream is retrieved.

The present invention provides a method for synthesizing a γ-phosphatemodified nucleotide comprising the steps of attaching a molecular tag toa pyrophosphate group and contacting the modified pyrophosphate with adNMP to produce a γ-phosphate tagged dNTP.

The present invention provides a method for 5′ end-labeling abiomolecule comprising the step of contacting the biomolecule with akinase able to transfer a γ-phosphate of a γ-phosphate labeled ATP tothe 5′ end of the biomolecule resulting in a covalently modifiedbiomolecule.

The present invention provides a method for end-labeling a polypeptideor carbohydrate comprising the step of contacting the polypeptide orcarbohydrate with an agent able to transfer an atomic or molecular tagto either a carboxy or amino end of a protein or polypeptide or toeither the γ-phosphate of a γ-phosphate labeled ATP to the 5′ end of thebiomolecule resulting in a covalently modified biomolecule.

DESCRIPTION OF THE DRAWINGS

The invention can be better understood with reference to the followingdetailed description together with the appended illustrative drawings inwhich like elements are numbered the same:

FIG. 1 depicts FRET activity as a function of distance separating thefluorescent donor and acceptor;

FIG. 2 depicts the open and closed ternary complex forms of the largefragment of Taq DNA pol I (Klentaq 1);

FIGS. 3A-C depicts an overlay between 3ktq (closed ‘black’) and 1tau(‘open light blue’), the large fragment of Taq DNA polymerase I;

FIG. 4 depicts an image of a 20% denaturing polyacrylamide gelcontaining size separated radiolabeled products from DNA extensionexperiments involving γ-ANS-phosphate-dATP;

FIG. 5 depicts an image of (A) the actual gel, (B) a lightenedphosphorimage and (C) an enhanced phosphorimage of products generated inDNA extension reactions using γ-ANS-phosphate-dNTPs;

FIG. 6 depicts an image of (A) 6% denaturing polyacrylamide gel, (B) alightened phosphorimage of the actual gel, and (C) an enhancedphosphorimage of the actual gel containing products generated in DNAextension reactions using γ-ANS-phosphate-dNTPs;

FIG. 7 depicts an image of (A) the actual gel, (B) a lightenedphosphorimage of the actual gel, and (C) an enhanced phosphorimage ofthe actual gel;

FIG. 8 depicts data for the Klenow fragment from E. coli DNA polymeraseI incorporation of gamma-modified nucleotides;

FIG. 9 depicts data for the Pfu DNA polymerase incorporation ofgamma-modified nucleotides;

FIG. 10 depicts data for the HIV-1 reverse transcriptase incorporationof gamma-tagged nucleotides;

FIG. 11 depicts experimental results for the native T7 DNA polymeraseand SEQUENASE® incorporation of gamma-tagged nucleotides; and

FIG. 12 depicts reaction products produced when the four naturalnucleotides (dATP, dCTP, dGTP and dTTP) are used in the synthesisreaction (solid line) and reaction products produced when base-modifiednucleotides are used in the synthesis reaction.

DETAILED DESCRIPTION OF THE INVENTION

The inventors have devised a methodology using tagged monomers such asdNTPs and/or tagged polymerizing agents such as polymerase and/or taggedagents associated with the polymerizing agent such as polymeraseassociated proteins or probes to directly readout the exact monomersequence such as a base sequence of an RNA or DNA sequence duringpolymerase activity. The methodology of this invention is adaptable toprotein synthesis or to carbohydrate synthesis or to the synthesis ofany molecular sequence where the sequence of monomers provides usableinformation such as the sequence of a RNA or DNA molecule, a protein, acarbohydrate, a mixed biomolecule or an inorganic or organic sequence ofmonomers which stores a data stream. The methods and apparatuses usingthese methods are designed to create new ways to address basic researchquestions such as monitoring conformation changes occurring duringreplication and assaying polymerase incorporation fidelity in a varietyof sequence contexts. The single-molecule detection systems of thisinvention are designed to improve fluorescent molecule chemistry,computer modeling, base-calling algorithms, and genetic engineering ofbiomolecules, especially for real-time or near real-time sequencing. Theinventors have also found that the methodology can be adapted todepolymerizing agents such as exonucleases where the polymer sequence isdetermined by depolymerization instead of polymerization. Moreover, thesingle-molecule systems of this invention are amendable to paralleland/or massively parallel assays, where tagged polymerases are patternedin arrays on a substrate. The data collected from such arrays can beused to improve sequence confidence and/or to simultaneously sequenceDNA regions from many different sources to identify similarities ordifferences.

The pattern of emission signals is collected, either directly, such asby an Intensified Charge Coupled Devise (ICCD) or through anintermediate or series of intermediates to amplify signal prior toelectronic detection, where the signals are decoded and confidencevalues are assigned to each base to reveal the sequence complementary tothat of the template. Thus, the present invention also providestechniques for amplifying the fluorescent light emitted from afluorescent tag using physical light amplification techniques ormolecular cascading agent to amplify the light produced bysingle-molecular fluorescent events.

The single-molecule DNA sequencing systems of this invention have thepotential to replace current DNA sequencing technologies, because themethodology can decrease time, labor, and costs associated with thesequencing process, and can lead to highly scalable sequencing systems,improving the DNA sequence discovery process by at least one to twoorders of magnitude per reaction.

The single-molecule DNA sequencing technology of this invention can: (1)make it easier to classify an organism or identify variations within anorganism by simply sequencing the genome or a portion thereof; (2) makerapid identification of a pathogen or a genetically-modified pathogeneasier, especially in extreme circumstances such as in pathogens used inwarfare; and (3) make rapid identification of persons for either lawenforcement and military applications easier.

One embodiment of the single-molecule sequencing technology of thisinvention involves strategically positioning a pair of tags on a DNApolymerase so that as a dNTP is incorporated during the polymerizationreaction, the tags change relative separation. This relative changecauses a change in a detectable property, such as the intensity and/orfrequency of fluorescence from one or both of the tags. A time profileof these changes in the detectable property evidences each monomerincorporation event and provides evidence about which particular dNTP isbeing incorporated at each incorporation event. The pair of tags do nothave to be covalently attached to the polymerase, but can be attached tomolecules that associate with the polymerase in such a way that therelative separation of the tags change during base incorporation.

Another embodiment of the single-molecule sequencing technology of thisinvention involves a single tag strategically positioned on a DNApolymerase that interacts with a tag on a dNTP or separate tags on eachdNTP. The tags could be different for each dNTP such as color-coded tagswhich emit a different color of fluorescent light. As the next dNTP isincorporated during the polymerization process, the identity of the baseis indicated by a signature fluorescent signal (color) or a change in afluorescent signal intensity and/or frequency. The rate of polymeraseincorporation can be varied and/or controlled to create an essentially“real-time” or near “real-time” or real-time readout of polymeraseactivity and base sequence. Sequence data can be collected at a rateof >100,000 bases per hour from each polymerase.

In another embodiment of the single-molecule sequencing technology ofthis invention, the tagged polymerases each include a donor tag and anacceptor tag situated or located on or within the polymerase, where thedistance between the tags changes during dNTP binding, dNTPincorporation and/or chain extension. This change in inter-tag distanceresults in a change in the intensity and/or wavelength of emittedfluorescent light from the fluorescing tag. Monitoring the changes inintensity and/or frequency of the emitted light provides information ordata about polymerization events and the identity of incorporated bases.

In another embodiment, the tags on the polymerases are designed tointeract with the tags on the dNTPs, where the interaction changes adetectable property of one or both of the tags. Each fluorescentlytagged polymerase is monitored for polymerization using tagged dNTPs todetermine the efficacy of base incorporation data derived therefrom.Specific assays and protocols have been developed along with specificanalytical equipment to measure and quantify the fluorescent dataallowing the determination and identification of each incorporated dNTP.Concurrently, the inventors have identified tagged dNTPs that arepolymerized by suitable polymerases and have developed software thatanalyze the fluorescence emitted from the reaction and interpret baseidentity. One skilled in the art will recognize that appropriatefluorescently active pairs are well-known in the art and commerciallyavailable from such vendors as Molecular Probes located in Oregon orBiosearch Technologies, Inc. in Novato, Calif.

The tagged DNA polymerase for use in this invention are geneticallyengineered to provide one or more tag binding sites that allow thedifferent embodiments of this invention to operate. Once a suitablepolymerase candidate is identified, specific amino acids within thepolymerase are mutated and/or modified such reactions well-known in theart; provided, however, that the mutation and/or modification do notsignificantly adversely affect polymerization efficiency. The mutatedand/or modified amino acids are adapted to facilitate tag attachmentsuch as a dye or fluorescent donor or acceptor molecule in the case oflight activated tags. Once formed, the engineered polymerase can becontacted with one or more appropriate tags and used in the apparatusesand methods of this invention.

Engineering a polymerase to function as a direct molecular sensor of DNAbase identity provides a route to a fast and potentially real-timeenzymatic DNA sequencing system. The single-molecule DNA sequencingsystem of this invention can significantly reduce time, labor, and costsassociated with the sequencing process and is highly scalable. Thesingle-molecule DNA sequencing system of this invention: (1) can improvethe sequence discovery process by at least two orders of magnitude perreaction; (2) is not constrained by the length limitations associatedwith the degradation-based, single-molecule methods; and (3) allowsdirect sequencing of desired (target) DNA sequences, especially genomeswithout the need for cloning or PCR amplification, both of whichintroduce errors in the sequence. The systems of this invention can makeeasier the task of classifying an organism or identifying variationswithin an organism by simply sequencing the genome in question or anydesired portion of the genome. The system of this invention is adaptedto rapidly identify pathogens or engineered pathogens, which hasimportance for assessing health-related effects, and for general DNAdiagnostics, including cancer detection and/or characterization, genomeanalysis, or a more comprehensive form of genetic variation detection.The single-molecule DNA sequencing system of this invention can becomean enabling platform technology for single-molecule genetic analysis.

The single-molecule sequencing systems of this invention have thefollowing advantages: (1) the systems eliminates sequencing reactionprocessing, gel or capillary loading, electrophoresis, and dataassembly; (2) the systems results in significant savings in labor, time,and costs; (3) the systems allows near real-time or real-time dataacquisition, processing and determination of incorporation events(timing, duration, etc.), base sequence, etc.; (4) the systems allowsparallel or massively parallel sample processing in microarray format;(5) the systems allows rapid genome sequencing, in time frames of a dayor less; (6) the systems requires very small amount of material foranalysis; (7) the systems allows rapid genetic identification, screeningand characterization of animals including humans or pathogen; (8) thesystems allows large increases in sequence throughput; (9) the systemcan avoid error introduced in PCR, RT-PCR, and transcription processes;(10) the systems can allow accurate sequence information forallele-specific mutation detection; (11) the systems allows rapidmedical diagnostics, e.g., Single Nucleotide Polymorphism (SNP)detection; (12) the systems allows improvement in basic research, e.g.,examination of polymerase incorporation rates in a variety of differentsequence contexts; analysis of errors in different contexts;epigenotypic analysis; analysis of protein glycosylation; proteinidentification; (13) the systems allows the creation of new robust(rugged) single-molecule detection apparatus; (14) the systems allowsthe development of systems and procedures that are compatible withbiomolecules; (15) the systems allows the development geneticnanomachines or nanotechnology; (16) the systems allows the constructionof large genetic databases and (17) the system has high sensitivity forlow mutation event detection.

Brief Overview of Single-Molecule DNA Sequencing

In one embodiment of the single-molecule DNA sequencing system of thisinvention, a single tag is attached to an appropriate site on apolymerase and a unique tag is attached to each of the four nucleotides:dATP, dTTP, dCTP and dGTP. The tags on each dNTPs are designed to have aunique emission signature (i.e., different emission frequency spectrumor color), which is directly detected upon incorporation. As a taggeddNTP is incorporated into a growing DNA polymer, a characteristicfluorescent signal or base emission signature is emitted due to theinteraction of polymerase tag and the dNTP tag. The fluorescent signals,i.e., the emission intensity and/or frequency, are then detected andanalyzed to determine DNA base sequence.

One criteria for selection of the tagged polymerase and/or dNTPs for usein this invention is that the tags on either the polymerase and/or thedNTPs do not interfere with Watson-Crick base-pairing or significantlyadversely impact polymerase activity. The inventors have found thatdNTPs containing tags attached to the terminal (gamma) phosphate areincorporated by a native Taq polymerase either in combination withuntagged dNTPs or using only tagged dNTPs. Tagging the dNTPs on the βand/or γ phosphate group is preferred because the resulting DNA strandsdo not include any of the dNTP tags in their molecular make up,minimizing enzyme distortion and background fluorescence.

One embodiment of the sequencing system of this invention involvesplacing a fluorescent donor such as fluorescein or a fluorescein-typemolecule on the polymerase and unique fluorescent acceptors such as ad-rhodamine or a similar molecule on each dNTP, where each uniqueacceptor, when interacting with the donor on the polymerase, generates afluorescent spectrum including at least one distinguishable frequency orspectral feature. As an incoming, tagged dNTP is bound by the polymerasefor DNA elongation, the detected fluorescent signal or spectrum isanalyzed and the identity of the incorporated base is determined.

Another embodiment of the sequencing system of this invention involves afluorescent tag on the polymerase and unique quenchers on the dNTPs,where the quenchers preferably have distinguishable quenchingefficiencies for the polymerase tag. Consequently, the identity of eachincoming quencher tagged dNTP is determined by its unique quenchingefficiency of the emission of the polymerase fluorescent tag. Again, thesignals produced during incorporation are detected and analyzed todetermine each base incorporated, the sequence of which generates theDNA base sequence.

Reagents

Suitable polymerizing agents for use in this invention include, withoutlimitation, any polymerizing agent that polymerizes monomers relative toa specific template such as a DNA or RNA polymerase, reversetranscriptase, or the like or that polymerizes monomers in a step-wisefashion.

Suitable polymerases for use in this invention include, withoutlimitation, any polymerase that can be isolated from its host insufficient amounts for purification and use and/or geneticallyengineered into other organisms for expression, isolation andpurification in amounts sufficient for use in this invention such as DNAor RNA polymerases that polymerize DNA, RNA or mixed sequences, intoextended nucleic acid polymers. Preferred polymerases for use in thisinvention include mutants or mutated variants of native polymeraseswhere the mutants have one or more amino acids replaced by amino acidsamenable to attaching an atomic or molecular tag, which have adetectable property. Exemplary DNA polymerases include, withoutlimitation, HIV1-Reverse Transcriptase using either RNA or DNAtemplates, DNA pol I from T. aquaticus or E. coli, Bateriophage T4 DNApol, T7 DNA pol or the like. Exemplary RNA polymerases include, withoutlimitation, T7 RNA polymerase or the like.

Suitable depolymerizing agents for use in this invention include,without limitation, any depolymerizing agent that depolymerizes monomersin a step-wise fashion such as exonucleases in the case of DNA, RNA ormixed DNA/RNA polymers, proteases in the case of polypeptides andenzymes or enzyme systems that sequentially depolymerizepolysaccharides.

Suitable monomers for use in this invention include, without limitation,any monomer that can be step-wise polymerized into a polymer using apolymerizing agent. Suitable nucleotides for use in this inventioninclude, without limitation, naturally occurring nucleotides, syntheticanalogs thereof, analog having atomic and/or molecular tags attachedthereto, or mixtures or combinations thereof.

Suitable atomic tag for use in this invention include, withoutlimitation, any atomic element amenable to attachment to a specific sitein a polymerizing agent or dNTP, especially Europium shift agents, nmractive atoms or the like.

Suitable atomic tag for use in this invention include, withoutlimitation, any atomic element amenable to attachment to a specific sitein a polymerizing agent or dNTP, especially fluorescent dyes such asd-Rhodamine acceptor dyes including dichloro[R110], dichloro[R6G],dichloro[TAMRA], dichloro[ROX] or the like, fluorescein donor dyeincluding fluorescein, 6-FAM, or the like; Acridine including Acridineorange, Acridine yellow, Proflavin, pH 7, or the like; AromaticHydrocarbon including 2-Methylbenzoxazole, Ethylp-dimethylaminobenzoate, Phenol, Pyrrole, benzene, toluene, or the like;Arylmethine Dyes including Auramine O, Crystal violet, H2O, Crystalviolet, glycerol, Malachite Green or the like; Coumarin dyes including7-Methoxycoumarin-4-acetic acid, Coumarin 1, Coumarin 30, Coumarin 314,Coumarin 343, Coumarin 6 or the like; Cyanine Dye including1,1′-diethyl-2,2′-cyanine iodide, Cryptocyanine, Indocarbocyanine(C3)dye, Indodicarbocyanine (C5)dye, Indotricarbocyanine (C7)dye,Oxacarbocyanine (C3)dye, Oxadicarbocyanine (C5)dye, Oxatricarbocyanine(C7)dye, Pinacyanol iodide, Stains all, Thiacarbocyanine (C3)dye,ethanol, Thiacarbocyanine (C3)dye, n-propanol, Thiadicarbocyanine(C5)dye, Thiatricarbocyanine (C7)dye, or the like; Dipyrrin dyesincluding N,N′-Difluoroboryl-1,9-dimethyl-5-(4-iodophenyl)-dipyrrin,N,N′-Difluoroboryl-1,9-dimethyl-5-[(4-(2-trimethylsilylethynyl),N,N′-Difluoroboryl-1,9-dimethyl-5-phenydipyrrin, or the like;Merocyanines including4-(dicyanomethylene)-2-methyl-6-(p-dimethylaminostyryl)-4H-pyran (DCM),acetonitrile,4-(dicyanomethylene)-2-methyl-6-(p-dimethylaminostyryl)-4H-pyran (DCM),methanol, 4-Dimethylamino-4′-nitrostilbene, Merocyanine 540, or thelike; Miscellaneous Dye including 4′,6-Diamidino-2-phenylindole (DAPI),4′,6-Diamidino-2-phenylindole (DAPI), dimethylsulfoxide,7-Benzylamino-4-nitrobenz-2-oxa-1,3-diazole, Dansyl glycine, H₂O, Dansylglycine, dioxane, HOECHST® 33258, DMF, HOECHST® 33258, H2O, Luciferyellow CH, Piroxicam, Quinine sulfate, 0.05 M H₂SO₄, Quinine sulfate,0.5 M H₂SO₄, Squarylium dye III, or the like; Oligophenylenes including2,5-Diphenyloxazole (PPO), Biphenyl, POPOP, p-Quaterphenyl, p-Terphenyl,or the like; Oxazines including Cresyl violet perchlorate, Nile Blue,methanol, Nile Red, Nile blue, ethanol, Oxazine 1, Oxazine 170, or thelike; Polycyclic Aromatic Hydrocarbons including9,10-Bis(phenylethynyl)anthracene, 9,10-Diphenylanthracene, Anthracene,Naphthalene, Perylene, Pyrene, or the like; polyene/polyynes including1,2-diphenylacetylene, 1,4-diphenylbutadiene, 1,4-diphenylbutadiene,1,6-Diphenylhexatriene, Beta-carotene, Stilbene, or the like;Redox-active Chromophores including Anthraquinone, Azobenzene,Benzoquinone, Ferrocene, Riboflavin, Tris(2,2′-bipyridyl)ruthenium(II),Tetrapyrrole, Bilirubin, Chlorophyll a, diethyl ether, Chlorophyll a,methanol, Chlorophyll b, Diprotonated-tetraphenylporphyrin, Hematin,Magnesium octaethylporphyrin, Magnesium octaethylporphyrin (MgOEP),Magnesium phthalocyanine (MgPc), PrOH, Magnesium phthalocyanine (MgPc),pyridine, Magnesium tetramesitylporphyrin (MgTMP), Magnesiumtetraphenylporphyrin (MgTPP), Octaethylporphyrin, Phthalocyanine (Pc),Porphin, Tetra-t-butylazaporphine, Tetra-t-butylnaphthalocyanine,Tetrakis(2,6-dichlorophenyl)porphyrin, Tetrakis(o-aminophenyl)porphyrin,Tetramesitylporphyrin (TMP), Tetraphenylporphyrin (TPP), Vitamin B12,Zinc octaethylporphyrin (ZnOEP), Zinc phthalocyanine (ZnPc), pyridine,Zinc tetramesitylporphyrin (ZnTMP), Zinc tetramesitylporphyrin radicalcation, Zinc tetraphenylporphyrin (ZnTPP), or the like; Xanthenesincluding Eosin Y, Fluorescein, basic ethanol, Fluorescein, ethanol,Rhodamine 123, Rhodamine 6G, Rhodamine B, Rose bengal, Sulforhodamine101, or the like; or mixtures or combination thereof or syntheticderivatives thereof or FRET fluorophore-quencher pairs including DLO-FB1(5′-FAM/3′-BHQ-1) DLO-TEB1 (5′-TET/3′-BHQ-1), DLO-JB1 (5′-JOE/3′-BHQ-1),DLO-HB1 (5′-HEX/3′-BHQ-1), DLO-C3B2 (5′-Cy3/3′-BHQ-g), DLO-TAB2(5′-TAMRA/3′-BHQ-g), DLO-RB2 (5′-ROX/3′-BHQ-g), DLO-C5B3(5′-Cy5/3′-BHQ-3), DLO-C55B3 (5′-Cy5.5/3′-BHQ-3), MBO-FB1(5′-FAM/3′-BHQ-1), MBO-TEB1 (5′-TET/3′-BHQ-1), MBO-JB1(5′-JOE/3′-BHQ-1), MBO-HB1 (5′-HEX/3′-BHQ-1), MBO-C3B2(5′-Cy3/3′-BHQ-g), MBO-TAB2 (5′-TAMRA/3′-BHQ-g), MBO-RB2(5′-ROX/3′-BHQ-g); MBO-C5B3 (5′-Cy5/3′-BHQ-3), MBO-C55B3(5′-Cy5.5/3′-BHQ-3) or similar FRET pairs available from BiosearchTechnologies, Inc. of Novato, Calif., tags with nmr active groups, tagswith spectral features that can be easily identified such as IR, far IR,visible UV, far UV or the like.

Enzyme Choice

The inventors have found that the DNA polymerase from Thermusaquaticus—Taq DNA polymerase I—is ideally suited for use in thesingle-molecule apparatuses, systems and methods of this invention. TaqDNA Polymerase, sometimes simply referred to herein as Taq, has manyattributes that the inventors can utilize in constructing taggedpolymerases for use in the inventions disclosed in this application. Ofcourse, ordinary artisans will recognize that other polymerases can beadapted for use in the single-molecule sequencing systems of thisinvention.

Since Taq DNA polymerase I tolerates so many mutations within or nearits active site (as reviewed in Patel et al, J. Mol. Biol., volume 308,pages 823-837, and incorporated herein by reference), the enzyme is moretolerant of enzyme tagging modification(s) and also able to incorporatea wider range of modified nucleotide substrates.

Crystal Structures are Available for Taq DNA Polymerase

There are 13 structures solved for Taq DNA polymerase, with or withoutDNA template/primer, dNTP, or ddNTP, which allows sufficient informationfor the selection of amino acid sites within the polymerase to which anatomic and/or molecular tag such as a fluorescent tag can be attachedwithout adversely affecting polymerase activity. See, e.g., Eom et al.,1996; Li et al., 1998a; Li et al., 1998b. Additionally, the inventorshave a written program to aid in identifying optimal tag addition sites.The program compares structural data associated with the Taq polymerasein its open and closed form to identify regions in the polymerasestructure that are optimally positioned to optimize the difference inconformation extremes between a tag on the polymerase and the dNTP or tooptimize a change in separation between two tags on the polymerase,thereby increasing or maximizing changes in a detectable property of oneof the tags or tag pair.

Taq DNA Polymerase is Efficiently Expressed in E. Coli

The Taq DNA polymerase is efficiently expressed in E. coli allowingefficient production and purification of the nascent polymerase andvariants thereof for rapid identification, characterization andoptimization of an engineered Taq DNA polymerase for use in thesingle-molecule DNA sequencing systems of this invention.

No Cysteines are Present in the Protein Sequence

The Taq DNA polymerase contains no cysteines, which allows the easygeneration of cysteine-containing mutants in which a single cysteine isplaced or substituted for an existing amino acid at strategic sites,where the inserted cysteine serves as a tag attachment site.

The Processivity of the Enzyme can be Modified

Although native Taq DNA polymerase may not represent an optimalpolymerase for sequencing system of this invention because it is not avery processive polymerase (50-80 nucleotides are incorporated beforedissociation), the low processivity may be compensated for byappropriately modifying the base calling software. Alternatively, theprocessivity of the Taq DNA Polymerase can be enhanced through geneticengineering by inserting into the polymerase gene a processivityenhancing sequence. Highly processive polymerases are expected tominimize complications that may arise from template dissociationeffects, which can alter polymerization rate. The processivity of Taqcan be genetically altered by introducing the 76 amino acid‘processivity domain’ from T7 DNA polymerase between the H and H₁helices (at the tip of ‘thumb’ region within the polymerase) of Taq. Theprocessivity domain also includes the thioredoxin binding domain (TBD)from T7 DNA polymerase causing the Taq polymerase to bethioredoxin-dependent increasing both the processivity and specificactivity of Taq polymerase. See, e.g., Bedford et al., 1997; Bedford etal., 1999.

Taq DNA Polymerase Possesses a 5′ to 3′ Exonuclease Activity and isThermostable

Single-stranded M13 DNA and synthetic oligonucleotides are used in theinitial studies. After polymerase activity is optimized, the sequencingsystem can be used to directly determine sequence information from anisolated chromosome—a double-stranded DNA molecule. Generally, heating asample of double-stranded DNA is sufficient to produce or maintain thedouble-stranded DNA in stranded DNA form for sequencing.

To favor the single-stranded state, the 5′ to 3′ exonuclease activity ofthe native Taq DNA polymerase in the enzyme engineered forsingle-molecule DNA sequencing is retained. This activity of thepolymerase is exploited by the ‘TaqMan’ assay. The exonuclease activityremoves a duplex strand that may renature downstream from thereplication site using a nick-translation reaction mechanism. Synthesisfrom the engineered polymerase is initiated either by a syntheticoligonucleotide primer (if a specific reaction start is necessary) or bya nick in the DNA molecule (if multiple reactions are processed) todetermine the sequence of an entire DNA molecule.

The Polymerase is Free from 3′ to 5′ Exonuclease Activity

The Taq DNA polymerase is does not contain 3′ to 5′ exonucleaseactivity, which means that the polymerase cannot replace a base, forwhich fluorescent signal was detected, with another base which wouldproduce another signature fluorescent signal.

All polymerases make replication errors. The 3′ to 5′ exonucleaseactivity is used to proofread the newly replicated DNA strand. Since TaqDNA polymerase lacks this proofreading function, an error in baseincorporation becomes an error in DNA replication. Error rates for TaqDNA polymerase are 1 error per ˜100,000 bases synthesized, which issufficiently low to assure a relatively high fidelity. See, e.g., Eckertand Kunkel, 1990; Cline et al., 1996. It has been suggested and verifiedfor a polymerase that the elimination of this exonuclease activityuncovers a decreased fidelity during incorporation. Thus, Taq polymerasemust—by necessity—be more accurate during initial nucleotide selectionand/or incorporation, and is therefore an excellent choice of use in thepresent inventions.

The error rate of engineered polymerases of this invention is assayed bydetermining their error rates in synthesizing known sequences. The errorrate determines the optimal number of reactions to be run in parallel sothat sequencing information can be assigned with confidence. The optimalnumber can be 1 or 10 or more. For example, the inventors havediscovered that base context influences polymerase accuracy and reactionkinetics, and this information is used to assign confidence values toindividual base calls. However, depending on the goal of a particularsequencing project, it may be more important to generate a genomesequence as rapidly as possible. For example, it may be preferable togenerate, or draft, the genome sequence of a pathogen at reducedaccuracy for initial identification purposes or for fast screening ofpotential pathogens.

Taq DNA Polymerase is the Enzyme of Choice for Single-Molecule DNASequencing

Engineering the polymerase to function as a direct molecular sensor ofDNA base identity provides the fastest enzymatic DNA sequencing systempossible. For the reasons detailed above, Taq DNA polymerase is theoptimal enzyme to genetically modify and adapt for single-molecule DNAsequencing. Additionally, basic research questions concerning DNApolymerase structure and function during replication can be addressedusing this technology advancing single-molecule detection systems andmolecular models in other disciplines. The inventors have found thatnative Taq DNA polymerase incorporates gamma-tagged dNTPs, yieldingextended DNA polymers. Importantly, incorporation of a modifiednucleotide is not detrimental to polymerase activity and extension ofprimer strands by incorporation of a_-tagged nucleotide conforms toWatson-Crick base pairing rules.

Detecting Tagged Polymerase-Nucleotide Interactions

One preferred method for detecting polymerase-nucleotide interactionsinvolves a fluorescence resonance energy transfer-based (FRET-based)method to maximize signal and minimize noise. A FRET-based method existswhen the emission from an acceptor is more intense than the emissionfrom a donor, i.e., the acceptor has a higher fluorescence quantum yieldthan the donor at the excitation frequency. The efficiency of FRETmethod can be estimated form computational models. See, e.g., Furey etal., 1998; Clegg et al., 1993; Mathies et al., 1990. The efficiency ofenergy transfer (E) is computed from equation (1) as follows:

E=1/(1+[R/R ₀]⁶)  (1)

where R₀ is the Forster critical distance at E=0.5. R₀ is calculatedfrom equation (2):

R ₀=(9.79×10³)(κ²η⁻⁴ Q _(D) J _(DA))^(1/6)  (2)

where η is the refractive index of the medium (η=1.4 for aqueoussolution), κ² is a geometric orientation factor related to the relativeangle of the two transition dipoles (κ² is generally assumed to be ⅔),J_(DA) [M⁻¹ cm³] is the overlap integral representing the normalizedspectral overlap of the donor emission and acceptor absorption, andQ_(D) is the quantum yield. The overlap integral is computed fromequation (3):

J _(DA) =[∫F _(D)(λ)ε_(A)(λ)λ⁴ dλ]/[∫F _(D)(λ)dλ]  (3)

where F_(D) is the donor emission, ε_(A) is the acceptor absorption.Q_(D) is calculated from equation (4):

Q _(D) =Q _(RF)(I _(D) /I _(RF))(A _(RF) /A _(D))  (4)

where I_(D) and I_(RF) are the fluorescence intensities of donor and areference compound (fluorescein in 0.1 N NaOH), and A_(RF) and A_(D) arethe absorbances of the reference compound and donor. Q_(RF) is thequantum yield of fluorescein in 0.1N NaOH and is taken to be 0.90.

R, the distance between the donor and acceptor, is measured by lookingat different configurations (e.g., conformations) of the polymerase inorder to obtain a conformationally averaged value. If both tags are onthe polymerase, then R is the distance between the donor and acceptor inthe open and closed conformation, while if the donor is on thepolymerase and the acceptor on the dNTP, R is the distance between thedonor and acceptor when the dNTP is bound to the polymerase and thepolymerase is its closed form.

The distance between the tagged γ-phosphate and the selected amino acidsites for labeling in the open versus closed polymerase conformationdelineates optimal dye combinations. If the distance (R) between thedonor and acceptor is the same as R₀ (R₀ is the Forster criticaldistance), FRET efficiency (E) is 50%. If R is more than 1.5 R₀, theenergy transfer efficiency becomes negligible (E<0.02). Sites within theenzyme at which R/R₀ differ by more than 1.6 in the open versus closedforms are identified and, if necessary, these distances and/or distancedifferences can be increased through genetic engineering. A plot of FRETefficiency verses distance is shown in FIG. 1.

Fluorescent Dye Selection Process

Dye sets are chosen to maximize energy transfer efficiency between atagged dNTP and a tag on the polymerase when the polymerase is in itsclosed configuration and to minimize energy transfer efficiency betweenthe tag on the DNTP (either non-productively bound or in solution) andthe tag on the polymerase when the polymerase is in its openconfiguration. Given a molarity of each nucleotide in the reactionmedium of no more than about 1 μM, an average distance between taggednucleotides is calculated to be greater than or equal to about 250 Å.Because this distance is several fold larger than the distanceseparating sites on the polymerase in its open to closed conformational,minimal FRET background between the polymerase and free dNTPs isobserved. Preferably, nucleotide concentrations are reduced below 1 μM.Reducing dNTP concentrations to levels of at least <10% of the K_(m)further minimizes background fluorescence and provides a convenientmethod for controlling the rate of the polymerase reaction for thereal-time monitoring. Under such conditions, the velocity of thepolymerization reaction is linearly proportional to the dNTPconcentration and, thus, highly sensitive to regulation. Additionally,the use of a single excitation wavelength allows improved identificationof unique tags on each dNTP. A single, lower-wavelength excitation laseris used to achieve high selectivity.

In one preferred embodiment, a fluorescence donor is attached to a siteon the polymerase comprising a replaced amino acid more amenable todonor attachment such as cysteine and four unique fluorescence acceptorsare attached to each dNTP. For example, fluorescein is attached to asite on the polymerase and rhodamine, rhodamine derivatives and/orfluorescein derivatives are attached to each dNTP. Each donor-acceptorfluorophore pair is designed to have an absorption spectrum sufficientlydistinct from the spectra of other pairs to allow separateidentification after excitation. Preferably, the donor is selected suchthat the excitation light activates the donor, which then efficiencytransfers the excitation energy to one of the acceptors. After energytransfer, the acceptor emits it unique fluorescence signature. Theemission of the fluorescence donor must significant overlap with theabsorption spectra of the fluorescence acceptors for efficient energytransfer. However, the methods of this invention can also be performedusing two, three or four unique fluorescence donor-acceptor pairs, byrunning parallel reactions.

Fluorophore choice is a function of not only its enzyme compatibility,but also its spectral and photophysical properties. For instance, it iscritical that the acceptor fluorophore does not have any significantabsorption at the excitation wavelength of the donor fluorophore, andless critical (but also desirable) is that the donor fluorophore doesnot have emission at the detection wavelength of the acceptorfluorophore. These spectral properties can be attenuated by chemicalmodifications of the fluorophore ring systems.

Although the dNTPs are amenable to tagging at several sites includingthe base, the sugar and the phosphate groups, the dNTPs are preferablytagged at either the β and/or γ phosphate. Tagging the terminalphosphates of dNTP has a unique advantage. When the incoming, taggeddNTP is bound to the active site of the polymerase, significant FRETfrom the donor on the polymerase to the acceptor on the dNTP occurs. Theunique fluorescence of the acceptor identifies which dNTP isincorporated. Once the tagged DNTP is incorporated into the growing DNAchain, the fluorescence acceptor, which is now attached to thepyrophosphate group, is released to the medium with the cleavedpyrophosphate group. In fact, the growing DNA chain includes nofluorescence acceptor molecules at all. In essence, FRET occurs onlybetween the donor on the polymerase and incoming acceptor-labeled dNTP,one at a time. This approach is better than the alternative attachmentof the acceptor to a site within the dNMP moiety of the dNTP or the useof multiply-modified dNTPs. If the acceptor is attached to a site otherthan the β or γ phosphate group, it becomes part of the growing DNAchain and the DNA chain will contain multiple fluorescence acceptors.Interference with the polymerization reaction and FRET measurementswould likely occur.

If the fluorescence from the tagged dNTPs in the polymerizing medium(background) is problematic, collisional quenchers can be added to thepolymerizing medium that do not covalently interact with the acceptorson the dNTPs and quench fluorescence from the tagged dNTPs in themedium. Of course, the quenchers are also adapted to have insignificantcontact with the donor on the polymerase. To minimize interactionbetween the collisional quenchers and the donor on the polymerase, thepolymerase tag is preferably localized internally and shielded from thecollisional quenchers or the collisional quencher can be made stericallybulky or associate with a sterically bulky group to decrease interactionbetween the quencher and the polymerase.

Another preferred method for detecting polymerase-nucleotideinteractions involves using nucleotide-specific quenching agents toquench the emission of a fluorescent tag on the polymerase. Thus, thepolymerase is tagged with a fluorophore, while each dNTP is labeled witha quencher for the fluorophore. Typically, DABCYL(4-(4′-dimethylaminophe-nylazo) benzoic acid is a universal quencher,which absorbs energy from a fluorophore, such as 5-(g′-aminoethyl)aminonaphthalene-1-sulfonic acid (AEANS) and dissipates heat.Preferably, a quencher is selected for each dNTP so that when eachquencher is brought into close proximity to the fluorophore, adistinguishable quenching efficiency is obtained. Therefore, the degreeof quenching is used to identify each dNTP as it is being incorporatedinto the growing DNA chain. One advantage of this preferred detectionmethod is that fluorescence emission comes from a single sourcerendering background noise negligible. Although less preferred, if onlytwo or three suitable quenchers are identified, then two or three of thefour dNTPs are labeled and a series of polymerization reaction are madeeach time with a different pair of the labeled dNTPs. Combining theresults from these runs generates a complete sequence of the DNAmolecule.

Site Selection for Labeling the Taq Polymerase and dNTPs

Although the present invention is directed to attaching any type ofatomic and/or molecular tag that has a detectable property, theprocesses for site selection and tag attachment are illustrated using apreferred class of tags, namely fluorescent tags.

Fluorescent Labeling of Polymerase and/or dNTPs

The fluorescence probes or quenchers attached to the polymerase or dNTPsare designed to minimize adverse effects on the DNA polymerizationreaction. The inventors have developed synthetic methods for chemicallytagging the polymerase and dNTPs with fluorescence probes or quenchers.

In general, the polymerase is tagged by replacing a selected amino acidcodon in the DNA sequence encoding the polymerase with a codon for anamino acid that more easily reacts with a molecular tag such as cysteinevia mutagenesis. Once a mutated DNA sequence is prepared, the mutant isinserted into E. coli for expression. After expression, the mutantpolymerase is isolated and purified. The purified mutant polymerase isthen tested for polymerase activity. After activity verification, themutant polymerase is reacted with a slight molar excess of a desired tagto achieve near stoichiometric labeling. Alternatively, the polymerasecan be treated with an excess amount of the tag and labeling followed asa function of time. The tagging reaction is than stopped when nearstoichiometric labeling is obtained.

If the mutant polymerase includes several sites including the targetresidue that can undergo tagging with the desired molecular tag, thenthe tagging reaction can also be carried out under special reactionconditions such as using a protecting group or competitive inhibitor anda reversible blocking group, which are later removed. If the targetamino acid residue in the mutant polymerase is close to the active dNTPbinding site, a saturating level of a protecting group or a competitiveinhibitor is first added to protect the target residue and a reversibleblocking group is subsequently added to inactivate non-target residues.The protecting group or competitive inhibitor is then removed from thetarget residue, and the mutant polymerase is treated with the desiredtag to label the target residue. Finally, the blocking groups arechemically removed from non-target residues in the mutant polymerase andremoved to obtain a tagged mutant polymerase with the tag substantiallyto completely isolated on the target residue.

Alternatively, if the target residue is not near the active site, thepolymerase can be treated with a blocking group to inactivate non-targetresidues. After removal of unreacted blocking group, the mutantpolymerase is treated with the desired tag for labeling the targetresidue. Finally, the blocking groups are chemically removed from thenon-target residues in the mutant polymerase and removed to obtain thetagged mutant polymerase.

Amino Acid Site Selection for the Taq Polymerase

The inventors have identified amino acids in the Taq polymerase that arelikely to withstand mutation and subsequent tag attachment such as theattachment of a fluorescent tag. While many sites are capable ofcysteine replacement and tag attachment, preferred sites in thepolymerase were identified using the following criteria: (1) they arenot in contact with other proteins; (2) they do not alter theconformation or folding of the polymerase; and (3) they are not involvedin the function of the protein. The selections were accomplished using acombination of mutational studies including sequence analyses data,computational studies including molecular docking data and assaying forpolymerase activity and fidelity. After site mutation, computationalstudies will be used to refine the molecular models and help to identifyother potential sites for mutation.

Regions of the protein surface that are not important for function wereidentified, indirectly, by investigating the variation in sequence as afunction of evolutionary time and protein function using theevolutionary trace method. See, e.g., Lichtarge et al., 1996. In thisapproach, amino acid residues that are important for structure orfunction are found by comparing evolutionary mutations and structuralhomologies. The polymerases are ideal systems for this type of study, asthere are many crystal and co-crystal structures and many availablesequences. The inventors have excluded regions of structural/functionalimportance from sites selection for mutation/labeling. In addition,visual inspection and overlays of available crystal structures of thepolymerase in different conformational states, provided furtherassistance in identifying amino acid sites near the binding site fordNTPs. Some of the chosen amino acids sites are somewhat internallylocated and preferably surround active regions in the polymerase thatundergo changes during base incorporation, such as the dNTP bindingregions, base incorporation regions, pyrophosphate release regions, etc.These internal sites are preferred because a tag on these sites showreduced background signals during detection, i.e., reduce interactionbetween the polymerase enzyme and non-specifically associated taggeddNTPs, when fluorescently tagged dNTPs are used.

Once tagged mutant polymerases are prepared and energy minimized in afull solvent environment, estimates of the effect on the structure ofthe polymerase due to the mutation and/or labeling are generated toprovide information about relative tag positioning and separation. Thisdata is then used to estimate FRET efficiencies prior to measurement. Ofcourse, if the dNTPs are tagged with quenchers, then theseconsiderations are not as important.

Another aspect of this invention involves the construction of molecularmechanics force field parameters for atomic and/or molecular tags suchas fluorescent tags used to tag the dNTPs and the polymerase andparameters for the fluorescent tagged amino acid on the polymeraseand/or dNTP. Force field parameters are using quantum mechanical studiesto obtain partial charge distributions and energies for relevantintramolecular conformations (i.e., for the dihedral angle definitions)derived from known polymerase crystal structures.

Ionization states of each ionizable residue are estimated using anelectrostatic model in which the protein is treated as a low dielectricregion and the solvent as a high dielectric, using the UHBD program.See, e.g., Antosiewicz et al., 1994; Briggs and Antosiewicz, 1999;Madura et al., 1995. The electrostatic free energies of ionization ofeach ionizable residue are computed by solving the Poisson-Boltzmannequation for each residue. These individual ionization free energies aremodified to take into account coupled titration behavior resulting in aset of self-consistent predicted ionization states. These predictedionization free energies are then recalculated so that shifts inionization caused by the binding of a DNTP are taken into account.Unexpected ionization states are subject to further computational andexperimental studies, leading to a set of partial charges for eachresidue in the protein, i.e., each ionizable residue in the protein canhave a different charge state depending on the type of attached tag oramino acid substitution.

To further aid in amino acid site selection, an electrostatic potentialmap is generated from properties of the molecular surface of the Taqpolymerase/DNA complex, screened by solvent and, optionally, bydissolved ions (i.e., ionic strength) using mainly the UHBD program. Themap provides guidance about binding locations for the dNTPs and theelectrostatic environment at proposed mutation/labeling sites.

The molecular models generated are designed to be continually refinedtaking into account new experimental data, allowing the construction ofimproved molecular models, improved molecular dynamics calculations andimproved force field parameters so that the models better predict systembehavior for refining tag chemistry and/or tag positioning, predictingnew polymerase mutants, base incorporation rates and polymerasefidelity.

Molecular docking simulations are used to predict the docked orientationof the natural and fluorescently labeled dNTPs, within the polymerasebinding pocket. The best-docked configurations are energy minimized inthe presence of an explicit solvent environment. In conjunction withamino acid sites in the polymerase selected for labeling, the dockingstudies are used to analyze how the tags interact and to predict FRETefficiency for each selected amino acid site.

With the exception of the electrostatics calculations, all docking,quantum mechanics, molecular mechanics, and molecular dynamicscalculations are and will be performed using the HyperChem (v6.0)computer program. The HyperChem software runs on PCs under a Windowsoperating system. A number of computer programs for data analysis or forFRET prediction (as described below) are and will be written on a PCusing the Linux operating system and the UHBD program running underLinux.

Analysis of Polymerase Structures

Co-crystal structures solved for DNA polymerase I (DNA pol I) from E.coli, T. aquaticus, B. stearothermophilus, T7 bacteriophage, and humanpol a demonstrate that (replicative) polymerases share mechanistic andstructural features. The structures that capture Taq DNA polymerase inan ‘open’ (non-productive) conformation and in a ‘closed’ (productive)conformation are of particular importance for identifying regions of thepolymerase that undergo changes during base incorporation. The additionof the nucleotide to the polymerase/primer/template complex isresponsible for the transition from its open to its closed conformation.Comparison of these structures provides information about theconformational changes that occur within the polymerase duringnucleotide incorporation. Specifically, in the closed conformation, thetip of the fingers domain is rotated inward by 46.degree., therebypositioning the dNTP at the 3′ end of the primer strand in thepolymerase active site. The geometry of this terminal base pair isprecisely matched with that of its binding pocket. The binding of thecorrect, complementary base facilitates formation of the closedconformation, whereas incorrect dNTP binding does not induce thisconformational change. Reaction chemistry occurs when the enzyme is inthe closed conformation.

Referring now to FIG. 2, the open and closed ternary complex forms ofthe large fragment of Taq DNA pol I (Klentaq 1) are shown in asuperimposition of their Cα tracings. The ternary complex contains theenzyme, the ddCTP and the primer/template duplex DNA. The open structureis shown in magenta and the closed structure is shown in yellow. Thedisorganized appearance in the upper left portion of the protein showsmovement of the ‘fingers’ domain in open and closed conformations.

Using a program to determine the change in position of amino acids inthe open and closed conformation of the polymerase relative to the gammaphosphate of a bound ddGTP from two different crystal structures of theTaq polymerase containing the primer and bound ddGTP, lists of in 20amino acid sites that undergo the largest change in position formutation and labeling were identified. The distances were calculated foreach amino acid between their alpha and beta carbon atoms and the gammaphosphate group of the bound ddGTP. Lists derived from the two differentsets of crystallographic data for the Taq polymerase are given in TablesI, II, III and IV below.

TABLE I The 20 Amino Acid Sites Undergoing the Largest Positional Changein 2ktq Data Between the Open Form of the Polymerase to the Closed Formof the Polymerase Relative to the Alpha Carbon of the Residue ResidueResidue Changes in Location Identity Distance (Å) 517 Alanine 9.10 516Alanine 6.86 515 Serine 6.53 513 Serine 6.40 518 Valine 5.12 514Threonine 3.94 488 Asparagine 3.73 487 Arginine 3.50 489 Glutamine 3.13495 Phenylalanine 3.05 491 Glutamic acid 2.90 486 Serine 2.78 490Leucine 2.62 586 Valine 2.61 492 Arginine 2.60 462 Glutamic acid 2.59483 Asparagine 2.47 685 Proline 2.46 587 Arginine 2.44 521 Alanine 2.38

TABLE II The 20 Amino Acid Sites Undergoing the Largest PositionalChange in 2ktq Data Between the Open Form of the Polymerase to theClosed Form of the Polymerase Relative to the Beta Carbon of the ResidueResidue Residue Changes in Location Identity Distance (Å) 517 Alanine10.98 516 Alanine 9.05 515 Serine 8.02 513 Serine 7.46 518 Valine 5.47685 Proline 5.16 487 Arginine 4.24 495 Phenylalanine 3.94 488 Asparticacid 3.88 520 Glutamic acid 3.66 491 Glutamic acid 3.41 587 Arginine3.39 521 Alanine 3.33 498 Leucine 3.21 489 Glutamine 3.08 514 Threonine2.97 581 Leucine 2.93 483 Asparagine 2.92 497 Glutamic acid 2.91 462Glutamic acid 2.83

TABLE III The 20 Amino Acid Sites Undergoing the Largest PositionalChange in 3ktq Data Between the Open Form of the Polymerase to theClosed Form of the Polymerase Relative to the Alpha Carbon of theResidue Residue Residue Changes in Location Identity Distance (Å) 517Alanine 8.95 656 Proline 8.75 657 Leucine 8.59 655 Aspartic acid 8.05660 Arginine 7.35 658 Methionine 7.06 659 Arginine 6.69 654 Valine 6.60513 Serine 6.59 516 Alanine 6.57 515 Serine 6.36 653 Alanine 6.16 661Alanine 5.94 652 Glutamic acid 5.44 647 Phenylalanine 5.25 649 Valine5.22 518 Valine 5.15 644 Serine 5.08 643 Alanine 5.01 650 Proline 4.72

TABLE IV The 20 Amino Acid Sites Undergoing the Largest PositionalChange in 3ktq Data Between the Open Form of the Polymerase to theClosed Form of the Polymerase Relative to the Beta Carbon of the ResidueResidue Residue Changes in Location Identity Distance (Å) 517 Alanine10.85 656 Proline 9.05 657 Leucine 8.75 516 Alanine 8.68 655 Asparticacid 8.24 515 Serine 7.92 660 Arginine 7.89 513 Serine 7.60 659 Arginine6.98 658 Methionine 6.77 654 Valine 6.25 653 Alanine 6.14 661 Alanine6.04 643 Alanine 5.74 649 Valine 5.55 647 Phenylalanine 5.45 518 Valine5.42 652 Glutamic acid 5.13 644 Serine 4.89 487 Arginine 4.77

The above listed amino acids represent preferred amino acid sites forcysteine replacement and subsequent tag attachment, because these sitesrepresent the sites in the Taq polymerase the undergo significantchanges in position during base incorporation.

To further refine the amino acid site selection, visualization of thepolymerase in its open and closed conformational extremes for theseidentified amino acid sites is used so that the final selected aminoacid sites maximize signal and minimize background noise, when modifiedto carry fluorescent tags for analysis using the FRET methodology. Aminoacid changes that are not predicted to significantly affect theprotein's secondary structure or activity make up a refined set of aminoacid sites in the Taq polymerase for mutagenesis and fluorescentmodification so that the tag is shielded from interaction with freedNTPs. The following three panels illustrate the protocol used in thisinvention to refine amino acid site selection from the about list ofamino acids that undergo the largest change in position relative to abound ddGTP as the polymerase transitions from the open to the closedform.

Referring now to FIGS. 3A-C, an overlay between 3 ktq (closed ‘black’)and 1 tau (open ‘light blue’), the large fragment of Taq DNA polymeraseI is shown. Looking at FIG. 3A, the bound DNA from 3 ktq is shown in redwhile the ddCTP bound to 3 ktq is in green. Three residues were visuallyidentified as moving the most when the polymerase goes from open (1 tau)to closed (3 ktq), namely, Asp655, Pro656, and Leu657. Based on furtheranalyses of the structures, Pro656 appears to have the role of cappingthe O-helix. Leu657's side chain is very close to another part of theprotein in the closed (3 ktq) form. Addition of a larger side chain/tagis thought to diminish the ability of the polymerase to achieve a fullyclosed, active conformation. Conversely, Asp655 is entirely solventexposed in both the closed and open conformations of the polymerase.Looking at FIG. 3B, a close-up view of the active site from the overlayof the 3 ktq (closed) and 1 tau (open) conformations of Taq polymeraseis shown. The large displacements between the open and closedconformations are evident. Looking at FIG. 3C, a close-up view of amolecular surface representation of 3 ktq (in the absence of DNA andddCTP). The molecular surface is colored in two areas, blue for Asp655and green for Leu657. In this representation, it is evident that Leu657is in close proximity to another part of the protein, because the greenpart of the molecular surface, in the thumb domain, is “connected” to apart of the fingers domain. This view shows this region of thepolymerase looking into the palm of the hand with fingers to the rightand thumb to the left.

Mutagenesis and Sequencing of Polymerase Variants

The gene encoding Taq DNA polymerase was obtained and will be expressedin pTTQ 18 in E. coli strain DH1. See, e.g., Engelke et al., 1990. Theinventors have identified candidate amino acids for mutagenesisincluding the amino acids in Tables I-IV, the refined lists or mixturesor combinations thereof. The inventors using standard molecular methodswell-known in the art introduced a cysteine codon, individually, at eachof target amino acid sites. See, e.g., Sambrook et al., 1989 and Allenet al., 1998. DNA is purified from isolated colonies expressing themutant polymerase, sequenced using dye-terminator fluorescent chemistry,detected on an ABI PRISM 377 Automated Sequencer, and analyzed usingSequencher™ available from GeneCodes, Inc.

Expression and Purification of Enzyme Variants

The inventors have demonstrated that the Taq polymerase is capable ofincorporating γ-tagged dNTPs to synthesize extended DNA sequences. Thenext step involves the construction of mutants capable of carrying a tagdesigned to interact with the tags on the dNTPS and optimization of thepolymerase for single-molecule sequencing. The mutants are constructedusing standard site specific mutagenesis as described above and in theexperimental section. The constructs are then inserted into andexpressed in E. coli. Mutant Taq polymerase is then obtained aftersufficient E. coli is grown for subsequence polymerase isolation andpurification.

Although E. coli can be grown to optical densities exceeding 100 bycomputer-controlled feedback-based supply of non-fermentationsubstrates, the resulting three kg of E. coli cell paste will beexcessive during polymerase optimization. Of course, when optimizedpolymerases construct are prepared, then this large scale productionwill be used. During the development of optimized polymerases, themutants are derived from E. coli cell masses grown in 10 Lwell-oxygenated batch cultures using a rich medium available from Amgen.For fast polymerase mutant screening, the mutants are prepared bygrowing E. coli in 2 L baffled shake glasses. Cell paste are thenharvested using a 6 L preparative centrifuge, lysed by French press, andcleared of cell debris by centrifugation. To reduce interference from E.coli nucleic acid sequences, it is preferably to also remove othernucleic acids. Removal is achieved using either nucleases (withsubsequent heat denaturation of the nuclease) or, preferably using avariation of the compaction agent-based nucleic acid precipitationprotocol as described in Murphy et al., Nature Biotechnology 17, 822,1999.

Because the thermal stability of Taq polymerase is considerably greaterthan typical E. coli proteins, purification of Taq polymerase or itsmutants from contaminating Taq polymerase proteins is achieved by asimple heat treatment of the crude polymerase at 75° C. for 60 minutes,which reduces E. coli protein contamination by approximately 100-fold.This reduction in E. coli protein contamination combined with the highinitial expression level, produces nearly pure Taq polymerase or itsmutants in a convenient initial step; provide, of course, that themutant polymerase retains the thermal stability of the nativepolymerase.

For routine sequencing and PCR screening, further limited purificationis generally required. A single anion-exchange step, typically on QSepharose at pH 8.0, is generally sufficient to produce a product pureenough to these tests. Preferably, a second purification step will alsobe performed to insure that contamination does not cloud the results ofsubsequent testing. The second purification step involves SDS-PAGE andCD-monitored melting experiments.

Selection of Site in dNTP to Accept Fluorescent Tag

Molecular docking simulations were carried out to predict the dockedorientation of the natural and fluorescently labeled dNTPs using theAutoDock computer program (Morris et al., 1998; Soares et al., 1999).Conformational flexibility is permitted during the docking simulationsmaking use of an efficient Lamarckian Genetic algorithm implemented inthe AutoDock program. A subset of protein side chains is also allowed tomove to accommodate the DNTP as it docks. The best docked configurationsis then energy minimized in the presence of a solvent environment.Experimental data are available which identify amino acids in thepolymerase active site that are involved in catalysis and in contactwith the template/primer DNA strands or the dNTP to be incorporated. Thecomputer-aided chemical modeling such as docking studies can be usedidentify and support sites in the dNTP that can be labeled and topredict the FRET efficiency of dNTPs carrying a specific label at aspecific site.

In general, the dNTPs are tagged either by reacting a dNTP with adesired tag or by reacting a precursor such as the pyrophosphate groupor the base with a desired tag and then completing the synthesis of thedNTP.

Chemical Modification of Nucleotides for DNA Polymerase Reactions

The inventors have developed syntheses for modifying fluorophore andfluorescence energy transfer compounds to have distinct opticalproperties for differential signal detection, for nucleotide/nucleosidesynthons for incorporation of modifications on base, sugar or phosphatebackbone positions, and for producing complementary sets of fourdeoxynucleotide triphosphates (dNTPs) containing substituents onnucleobases, sugar or phosphate backbone.

Synthesis of γ-Phosphate Modified dNTPs

The inventors have found that the native Taq polymerase is capable ofpolymerizing phosphate-modified dNTPs or ddNTPs. Again, tagging thedNPTs or ddNTPs at the beta and/or gamma phosphate groups is a preferredbecause the replicated DNA contains no unnatural bases, polymeraseactivity is not significantly adversely affected and long DNA strandsare produced. The inventors have synthesized γ-ANS-phosphate dNTPs,where the ANS is attached to the phosphate through a phosphamide bond.Although these tagged dNTPs are readily incorporated by the native Taqpolymerase and by HIV reverse transcriptase, ANS is only one of a widerange of tags that can be attached through either the β and/or γphosphate groups.

The present invention uses tagged dNTPs or ddNTPs in combination withpolymerase for signal detection. The dNTPs are modified at phosphatepositions (alpha, beta and/or gamma) and/or other positions ofnucleotides through a covalent bond or affinity association. The tagsare designed to be removed from the base before the next monomer isadded to the sequence. One method for removing the tag is to place thetag on the gamma and/or beta phosphates. The tag is removed aspyrophosphate dissociates from the growing DNA sequence. Another methodis to attach the tag to a position of on the monomer through a cleavablebond. The tag is then removed after incorporation and before the nextmonomer incorporation cleaving the cleavable bond using light, achemical bond cleaving reagent in the polymerization medium, and/orheat.

One generalized synthetic routine to synthesizing other γ-tagged dNTPsis given below:

where FR is a fluorescent tag, L is a linker group, X is either H or acounterion depending on the pH of the reaction medium, Z is a groupcapable of reaction with the hydroxyl group of the pyrophosphate and Z′is group after reaction with the dNMP. Preferably, Z is Cl, Br, I, OH,SH, NH₂, NHR, CO₂H, CO₂R, SiOH, SiOR, GeOH, GeOR, or similar reactivefunctional groups, where R is an alkyl, aryl, aralkyl, alkaryl,halogenated analogs thereof or hetero atom analogs thereof and Z′ is O,NH, NR, CO₂, SiO, GeO, where R is an alkyl, aryl, aralkyl, alkaryl,halogenated analogs thereof or hetero atom analogs thereof.

The synthesis involves reacting Z terminated fluorescent tag, FR-L-Zwith a pyrophosphate group, P₂O₆X₃H, in DCC and dichloromethane toproduce a fluorescent tagged pyrophosphate. After the fluorescent taggedpyrophosphate is prepared, it is reacted with a morpholine terminateddNMP in acidic THF to produce a dNTP having a fluorescent tag on itsγ-phosphate. Because the final reaction bears a fluorescent tag and islarger than starting materials, separation from unmodified startingmaterial and tagged pyrophosphate is straight forward.

A generalized synthesis of a FR-L group is shown below:

Fluorescein (FR) is first reacted with isobutyryl anhydride in pyridinein the presence of diisopropyl amine to produce a fluorescein havingboth ring hydroxy groups protected for subsequent linker attachment. Thehydroxy protected fluorescein is then reacted with N-hydroxylsuccinimidein DCC and dichloromethane to produce followed by the addition of1-hydroxy-6-amino hexane to produce an hydroxy terminated FR-L group.This group can then be reacted either with pyrophosphate to tag thedNTPs at their γ-phosphate group or to tag amino acids. See, e.g., Wardet al., 1987; Engelhardt et al., 1993; Little et al., 2000; Hobbs, 1991.

By using different fluorescent tags on each dNTP, tags can be designedso that each tag emits a distinguishable emission spectrum. The emissionspectra can be distinguished by producing tags with non-overlappingemission frequencies—multicolor—or each tag can have a non-overlappingspectral feature such a unique emission band, a unique absorption bandand/or a unique intensity feature. Systems that use a distinguishabletag on each dNTP improves confidence values associated with the basecalling algorithm.

The synthetic scheme shown above for fluorescein is adaptable to otherdyes as well such as tetrachlorofluorescein (JOE) orN,N,N′,N′-tetramethyl-6-carboxyrhodamine (TAMRA). Typically, the gammaphosphate tagged reactions are carried out in basic aqueous solutionsand a carbodiimide, such as DEC. Other fluorophore molecules and dNTPscan be similarly modified.

Synthesis of dNTP Tagged at on the Base

Although tagging the dNTPs at the beta and/or gamma phosphate ispreferred, the dNTPs can also be tagged on the base and/or sugarmoieties while maintaining their polymerase reaction activity. The sitesfor modifications are preferably selected to not interfere withWatson-Crick base pairing. A generalized scheme for base modification isshown below:

Polymerase Activity Assays Using a Fluorescently-Tagged Enzyme

The activities of polymerase variants are monitored throughoutpolymerase development. Polymerase activity is assayed after a candidateamino acid is mutated to cysteine and after fluorescent tagging of thecysteine. The assay used to monitor the ability of the native Taqpolymerase to incorporate fluorescently-tagged dNTPs is also used toscreen polymerase variants. Since the mutant Taq polymerases havealtered amino acid sequences, the assays provide mutant characterizationdata such as thermostability, fidelity, polymerization rate, affinityfor modified versus natural bases.

Mutant Taq polymerase activity assays are carried out under conditionssimilar to those used to examine the incorporation offluorescently-tagged dNTPs into DNA polymers by the native Taqpolymerase. To examine mutant Taq polymerase activity, the purifiedmutant Taq polymerase is incubated in polymerase reaction buffer with a5′-³²P end-labeled primer/single-stranded template duplex, andappropriate tagged dNTP(s). The polymerase's ability to incorporate afluorescently-tagged dNTP is monitored by assaying the relative amountof fluorescence associated with the extended primer on either an AB1377DNA Sequencer (for fluorescently tagged bases), a Fuji BAS1000phosphorimaging system, or other appropriate or similar detectors ordetection systems. This assay is used to confirm that the mutantpolymerase incorporates tagged dNTβ and to confirm that fluorescentsignatures are obtained during base incorporation. These assays use anend-labeled primer, the fluorescently-tagged dNTβ and the appropriatebase beyond the fluorescent tag. The products are then size separatedand analyzed for extension. Reactions are either performed underconstant temperature reaction conditions or thermocycled, as necessary.

Primer Extension Assays

The ability of Taq DNA polymerase to incorporate a γ-phosphate dNTPvariant is assayed using conditions similar to those developed toexamine single base incorporation by a fluorescently-tagged DNApolymerase. See, e.g., Furey et al., 1998. These experiments demonstratethat polymerases bearing a fluorescent tag do not a priori have reducedpolymerization activity. The inventors have demonstrated that the nativeTaq polymerase incorporates γ-tagged dNTP, singly or collectively toproduce long DNA chains.

To examine polymerase activity, the polymerase is incubated inpolymerase reaction buffer such as Taq DNA polymerase buffer availablefrom Promega Corporation of Madison, Wis. with either a 5′-³²P or afluorescently end-labeled primer (TOP)/single-stranded template(BOT-‘X’) duplex, and appropriate dNTP(s) as shown in Table V. Reactionsare carried out either at constant temperature or thermocycled, asdesired or as is necessary. Reaction products are then size-separatedand quantified using a phosphorimaging or fluorescent detection system.The relative efficiency of incorporation for each tagged dNTP isdetermined by comparison with its natural counterpart.

TABLE V Primer Strand: TOP 5′ GGT ACT AAG CGG CCG CAT (SEQ ID NO: 1) G3′ Template Strands: BOT-T 3′ CCA TGA TTC GCC GGC GTA (SEQ ID NO: 2) CTC5′ BOT-C 3′ CCA TGA TTC GCC GGC GTA (SEQ ID NO: 3) CCC 5′ BOT-G 3′ CCATGA TTC GCC GGC GTA (SEQ ID NO: 4) CGC 5′ BOT-A 3′ CCA TGA TTC GCC GGCGTA (SEQ ID NO: 5) CAC 5′ BOT-3T 3′ CCA TGA TTC GCC GGC GTA (SEQ ID NO:6) CTT TC 5′ BOT-Sau 3′ CCA TGA TTC GCC GGC GTA (SEQ ID NO: 7) CCT AG 5′

In Table V, ‘TOP’ represents the primer strand of an assay DNA duplex.Variants of the template strand are represented by ‘BOT’. The relevantfeature of the DNA template is indicated after the hyphen. For example,BOT-T, BOT-C, BOT-G, BOT-A are used to monitor polymerase incorporationefficiency and fidelity for either nucleotides or nucleotide variants ofdA, dG, dC, and dT, respectively.

Preliminary assays are performed prior to exhaustive purification of thetagged dNTP to ensure that the polymerase is not inhibited by a chemicalthat co-purifies with the tagged dNTP, using the ‘BOT-Sau’ template. The‘BOT-Sau’ template was designed to monitor incorporation of natural dGTPprior to tagged dATP (i.e., a positive control for polymerase activity).More extensive purification is then performed for promising taggednucleotides. Similarly, experiments are carried out to determine whetherthe polymerase continues extension following incorporation of the taggeddNTPs, individually or collectively, using the same end-labeled ‘TOP’primer, the appropriate ‘BOT’ primer, the fluorescently-tagged dNTP, andthe appropriate base 3′ of the tagged nucleotide. The products are thensize-separated and analyzed to determine the relative extensionefficiency.

Assay Fidelity of γ-Phosphate Tagged Nucleotide Incorporation

The Taq DNA polymerase lacks 3′ to 5′ exonuclease activity (proofreadingactivity). If the polymerase used in single-molecule DNA sequencingpossessed a 3′ to 5′ exonuclease activity, the polymerase would becapable of adding another base to replace one that would be removed bythe proofreading activity. This newly added base would produce asignature fluorescent signal evidencing the incorporation of anadditional base in the template, resulting in a misidentified DNAsequence, a situation that would render the single-molecule sequencingsystems of this invention problematic.

If the error rate for the incorporation of modified dNTPs exceeds athreshold level of about 1 error in 100, the sequencing reactions arepreferably run in parallel, with the optimal number required to producesequence information with a high degree of confidence for each base calldetermined by the error rate. Larger error rates require more parallelrun, while smaller error rates require fewer parallel runs. In fact, ifthe error rate is low enough, generally less than 1 error in 1,000,preferably 1 error in 5,000 and particularly 1 error in 10,000incorporated base, then no parallel runs are required. Insertions ordeletions are, potentially, more serious types of errors and warrant aminimal redundancy of 3 repeats per sample. If 2 reactions were run, onecould not be certain which was correct. Thus, 3 reactions are needed forthe high quality data produced by this system.

The BOT-variant templates are used to characterize the accuracy at whicheach γ-tagged dNTP is incorporated by an engineered polymerase as setforth in Table V. Oligonucleotides serve as DNA templates, and eachdiffering only in the identity of the first base incorporated.Experiments using these templates are used to examine the relativeincorporation efficiency of each base and the ability of the polymeraseto discriminate between the tagged dNTPs. Initially, experiments withpolymerase variants are carried out using relatively simple-sequence,single-stranded DNA templates. A wide array of sequence-characterizedtemplates is available from the University of Houston in Dr. Hardin'slaboratory, including a resource of over 300 purified templates. Forexample, one series of templates contains variable length polyA or polyTsequences. Additional defined-sequence templates are constructed asnecessary, facilitating the development of the base-calling algorithms.

Relative Fluorescence Intensity Assays

Direct detection of polymerase action on the tagged dNTP is obtained bysolution fluorescence measurements, using SPEX 212 instrument or similarinstrument. This instrument was used to successfully detect fluorescentsignals from ANS tagged γ-phosphate dNTPs, being incorporated by Taqpolymerase at nanomolar concentration levels. The SPEX 212 instrumentincludes a 450 watt xenon arc source, dual emission and dual excitationmonochromators, cooled PMT (recently upgraded to simultaneous T-formatanisotropy data collection), and a Hi-Tech stopped-flow accessory. Thisinstrument is capable of detecting an increase in fluorescence intensityand/or change in absorption spectra upon liberation of the taggedpyrophosphate from ANS tagged γ-phosphate dNTPs, as was verified forANS-pyrophosphate released by Taq and RNA polymerase and venomphosphodiesterase.

Experiments have been and are being performed by incubating γ-phosphatetagged dATP or TTP (Control: non-modified dATP and TTP) in anappropriate buffer (e.g., buffers available from Promega Corporation) inthe presence of polymerase (Control: no enzyme) and DNA primer/template[poly(dA). poly(dT)] (Control: no primer/template DNA). When thepolymerase incorporates a tagged dNTP, changes in fluorescence intensityand/or frequency, absorption and/or emission spectra, and DNA polymerconcentration are detected. Changes in these measurables as a functionof time and/or temperature for experimental versus control cuvettesallows for unambiguous determination of whether a polymerase isincorporating the γ-phosphate tagged dNTP. Excitation and fluorescenceemission can be optimized for each tagged dNTP based on changes in thesemeasurables.

Development of a Single-Molecule Detection System

The detection of fluorescence from single molecules is preferablycarried out using microscopy. Confocal-scanning microscopy can be usedin this application, but a non-scanning approach is preferred. Amicroscope useful for detecting fluorescent signals due to polymeraseactivity include any type of microscope, with oil-immersion typemicroscopes being preferred. The microscopes are preferably located inan environment in which vibration and temperature variations arecontrolled, and fitted with a highly-sensitive digital camera. Whilemany different cameras can be to record the fluorescent signals, thepreferred cameras are intensified CCD type cameras such as the iPentaMaxfrom Princeton Instruments.

The method of detection involves illuminating the samples at wavelengthssufficient to induce fluorescence of the tags, preferably in aninternal-reflection format. If the fluorescent tags are a donor-acceptorpair, then the excitation frequency must be sufficient to excite thedonor. Although any type of light source can be used, the preferredlight source is a laser. It will often be advantageous to image the samesample in multiple fluorescence emission wavelengths, either in rapidsuccession or simultaneously. For simultaneous multi-color imaging, animage splitter is preferred to allow the same CCD to collect all of thecolor images simultaneously. Alternatively, multiple cameras can beused, each viewing the sample through emission optical filters ofdifferent wavelength specificity.

Tag detection in practice, of course, depends upon many variablesincluding the specific tag used as well electrical, fluorescent,chemical, physical, electrochemical, mass isotope, or other properties.Single-molecule fluorescence imaging is obtainable employing aresearch-grade Nikon Diaphot TMD inverted epifluorescence microscope,upgraded with laser illumination and a more-sensitive camera. Moreover,single-molecule technology is a well-developed and commerciallyavailable technology. See, e.g., Peck et al., 1989; Ambrose et al.,1994; Goodwin et al., 1997; Brouwer et al., 1999; Castro and Williams,1997; Davis et al., 1991; Davis et al., 1992; Goodwin et al., 1997;Keller et al., 1996; Michaelis et al., 2000; Orrit and Bernard, 1990;Orrit et al., 1994; Sauer et al., 1999; Unger et al., 1999; Zhuang etal., 2000.

The epifluorescence microscope can be retrofitted for evanescent-waveexcitation using an argon ion laser at 488 nm. The inventors havepreviously used this illumination geometry in assays for nucleic acidhybridization studies. The existing setup has also been upgraded byreplacement of the current CCD camera with a 12-bit 512.times.512 pixelPrinceton Instruments I-PentaMAX generation IV intensified CCD camera,which has been used successfully in a variety of similar single-moleculeapplications. This camera achieves a quantum efficiency of over 45% inthe entire range of emission wavelengths of the dyes to be used, andconsiderably beyond this range. The vertical alignment of their existingmicroscope tends to minimize vibration problems, and the instrument iscurrently mounted on an anti-vibration table.

A preferred high-sensitivity imaging system is based on an OlympusIX70-S8F inverted epifluorescence microscope. The system incorporateslow-background components and enables capture of single moleculefluorescence images at rates of greater than 80 frames per second withquantum efficiency between 60-70% in the range of emission wavelengthsof the fluorescently active tags.

In imaging the fluorescence of multiple single molecules, it ispreferable to minimize the occurrence of multiple fluorescent emitterswithin a data collection channel such as a single pixel or pixel-bin ofthe viewing field of the CCD or other digital imaging system. A finitenumber of data collection channels such as pixels are available in anygiven digital imaging apparatus. Randomly-spaced, densely-positionedfluorescent emitters generally produce an increased fraction of pixelsor pixel bins that are multiply-occupied and problematic in dataanalysis. As the density of emitters in the viewing field increases sodoes the number of problematic data channels. While multiple occupancyof distinguishable data collection regions within the viewing field canbe reduced by reducing the concentration of emitters in the viewingfield, this decrease in concentration of emitters increases the fractionof data collection channels or pixels that see no emitter at all,therefore, leading to inefficient data collection.

A preferred method for increasing and/or maximizing the data collectionefficiency involves controlling the spacing between emitters (taggedpolymerase molecules). This spacing is achieved in a number of ways.First, the polymerases can be immobilized on a substrate so that only asingle polymerase is localized within each data collection channel orpixel region within the viewing field of the imaging system. Theimmobilization is accomplished by anchoring a capture agent or linkinggroup chemically attached to the substrate. Capture or linking agentscan be spaced to useful distances by choosing inherently large captureagents, by conjugating them with or bonding them to molecules whichenhance their steric bulk or electrostatic repulsion bulk, or byimmobilizing under conditions chosen to maximize repulsion betweenpolymerizing molecular assembly (e.g., low ionic strength to maximizeelectrostatic repulsion).

Alternatively, the polymerase can be associated with associated proteinsthat increase the steric bulk of the polymerase or the electrostaticrepulsion bulk of the polymerizing system so that each polymerizingmolecular assembly cannot approach any closer than a distance greaterthan the data channel resolution size of the imaging system.

Polymerase Activity Assays Using a Single-Molecule Detection System

These assays are performed essentially as described in for polymeraseactivity assays described herein. As stated above, the primarydifference between assaying polymerase activity for screening purposesinvolves the immobilization of some part of the polymerizing assemblysuch as the polymerase, target DNA or a primer associated protein to asolid support to enable viewing of individual replication events. Avariety of immobilization options are available, including, withoutlimitation, covalent and/or non-covalent attachment of one of themolecular assemblies on a surface such as an organic surface, aninorganic surface, in or on a nanotubes or other similar nano-structuresand/or in or on porous matrices. These immobilization techniques aredesigned to provide specific areas for detection of the detectableproperty such as fluorescent, NMR, or the like, where the spacing issufficient to decrease or minimize data collection channels havingmultiple emitters. Thus, a preferred data collection method forsingle-molecule sequencing is to ensure that the fluorescently taggedpolymerases are spaced apart within the viewing field of the imaginingapparatus so that each data collection channel sees the activity of onlya single polymerase.

Analysis of Fluorescent Signals from Single-Molecule Sequencing System

The raw data generated by the detector represents between one to fourtime-dependent fluorescence data streams comprising wavelengths andintensities: one data stream for each fluorescently labeled base beingmonitored. Assignment of base identities and reliabilities arecalculated using the PHRED computer program. If needed, the inventorswill write computer programs to interpret the data streams havingpartial and overlapping data. In such cases, multiple experiments arerun so that confidence limits are assigned to each base identityaccording to the variation in the reliability indices and thedifficulties associated with assembling stretches of sequence fromfragments. The reliability indices represent the goodness of the fitbetween the observed wavelengths and intensities of fluorescencecompared with ideal values. The result of the signal analyses is alinear DNA sequence with associated probabilities of certainty.Additionally, when required, the data is stored in a database fordynamic querying for identification and comparison purposes. A searchterm (sequence) of 6-10, 11-16, 17-20, 21-30 bases can be comparedagainst reference sequences to quickly identify perfectly matchedsequences or those sharing a user defined level of identity. Multipleexperiments are run so that confidence limits can be assigned to eachbase identity according to the variation in the reliability indices andthe difficulties associated with assembling stretches of sequence fromfragments. The reliability indices represent the goodness of the fitbetween the observed wavelengths and intensities of fluorescencecompared with the ideal values. The result of the signal analyses is alinear DNA sequence with associated probabilities of certainty.

Informatics: Analysis of Fluorescent Signals from the Single-MoleculeDetection System

Data collection allows data to be assembled from partial information toobtain sequence information from multiple polymerase molecules in orderto determine the overall sequence of the template or target molecule. Animportant driving force for convolving together results obtained withmultiple single-molecules is the impossibility of obtaining data from asingle molecule over an indefinite period of time. At a typical dyephotobleaching efficiency of 2*10⁻⁵, a typical dye molecule is expectedto undergo 50,000 excitation/emission cycles before permanentphotobleaching. Data collection from a given molecule may also beinterrupted by intersystem crossing to an optically inactive (on thetime scales of interest) triplet state. Even with precautions againstphotobleaching, therefore, data obtained from any given molecule isnecessarily fragmentary for template sequences of substantial length,and these subsequences are co-processed in order to derive the overallsequence of a target DNA molecule.

Additionally, in certain instances it is useful to perform reactionswith reference controls, similar to microarray assays. Comparison ofsignal(s) between the reference sequence and the test sample are used toidentify differences and similarities in sequences or sequencecomposition. Such reactions can be used for fast screening of DNApolymers to determine degrees of homology between the polymers, todetermine polymorphisms in DNA polymers, or to identity pathogens.

Examples Cloning and Mutagenesis of Taq Polymerase

Cloning

Bacteriophage lambda host strain Charon 35 harboring the full-length ofthe Thermus aquaticus gene encoding DNA polymerase I (Taq pol I) wasobtained from the AMERICAN TYPE CULTURE COLLECTION (ATCC; Manassas,Va.). Taq pol I was amplified directly from the lysate of the infectedE. coli host using the following DNA oligonucleotide primers:

Taq Pol I forward (SEQ ID NO: 8) 5′-gcgaattcat gagggggatg ctgcccctctttgagccc-3′ Taq Pol I reverse (SEQ ID NO: 9) 5′-gcgaattcac cctccttggcggagcgccag tcctccc-3′

The underlined segment of each synthetic DNA oligonucleotide representsengineered EcoRI restriction sites immediately preceding and followingthe Taq pol I gene. PCR amplification using the reverse primer describedabove and the following forward primer created an additional constructwith an N-terminal deletion of the gene:

Taq Pol I_A293_trunk (SEQ ID NO: 10) 5′-aatccatggg ccctggagga ggccccctggcccccgc-3′

The underlined segment corresponds to an engineered NcoI restrictionsite with the first codon encoding for an alanine (ATG startrepresenting an expression vector following the ribosome binding site).Ideally, the full-length and truncated constructs of the Taq pol I geneis ligated to a single EcoRI site (full-length) and in an Ncoo/EcoRIdigested pRSET-b expression vector. E. coli strain JM109 is used as hostfor all in vivo manipulation of the engineered vectors.

Mutagenesis

Once a suitable construct is generated, individual cysteine mutationsare introduced at preferred amino acid positions including positions513-518, 643, 647, 649 and 653-661 of the native Taq polymerase havingthe following amino acid sequence (SEQ ID NO:11):

(SEQ ID NO. 11) Met Arg Gly Met Leu Pro Leu Phe Glu Pro Lys Gly Arg ValLeu Leu 1               5                   10                  15 ValAsp Gly His His Leu Ala Tyr Arg Thr Phe His Ala Leu Lys Gly            20                  25                  30 Leu Thr Thr SerArg Gly Glu Pro Val Gln Ala Val Tyr Gly Phe Ala        35                  40                  45 Lys Ser Leu Leu LysAla Leu Lys Glu Asp Gly Asp Ala Val Ile Val    50                  55                  60 Val Phe Asp Ala Lys AlaPro Ser Phe Arg His Glu Ala Tyr Gly Gly65                  70                  75                  80 Tyr LysAla Gly Arg Ala Pro Thr Pro Glu Asp Phe Pro Arg Gln Leu                85                  90                  95 Ala Leu IleLys Glu Leu Val Asp Leu Leu Gly Leu Ala Arg Leu Glu            100                 105                 110 Val Pro Gly TyrGlu Ala Asp Asp Val Leu Ala Ser Leu Ala Lys Lys        115                 120                 125 Ala Glu Lys Glu GlyTyr Glu Val Arg Ile Leu Thr Ala Asp Lys Asp    130                 135                 140 Leu Tyr Gln Leu Leu SerAsp Arg Ile His Val Leu His Pro Glu Gly145                 150                 155                 160 Tyr LeuIle Thr Pro Ala Trp Leu Trp Glu Lys Tyr Gly Leu Arg Pro                165                 170                 175 Asp Gln TrpAla Asp Tyr Arg Ala Leu Thr Gly Asp Glu Ser Asp Asn            180                 185                 190 Leu Pro Gly ValLys Gly Ile Gly Glu Lys Thr Ala Arg Lys Leu Leu        195                 200                 205 Glu Glu Trp Gly SerLeu Glu Ala Leu Leu Lys Asn Leu Asp Arg Leu    210                 215                 220 Lys Pro Ala Ile Arg GluLys Ile Leu Ala His Met Asp Asp Leu Lys225                 230                 235                 240 Leu SerTrp Asp Leu Ala Lys Val Arg Thr Asp Leu Pro Leu Glu Val                245                 250                 255 Asp Phe AlaLys Arg Arg Glu Pro Asp Arg Glu Arg Leu Arg Ala Phe            260                 265                 270 Leu Glu Arg LeuGlu Phe Gly Ser Leu Leu His Glu Phe Gly Leu Leu        275                 280                 285 Glu Ser Pro Lys AlaLeu Glu Glu Ala Pro Trp Pro Pro Pro Glu Gly    290                 295                 300 Ala Phe Val Gly Phe ValLeu Ser Arg Lys Glu Pro Met Trp Ala Asp305                 310                 315                 320 Leu LeuAla Leu Ala Ala Ala Arg Gly Gly Arg Val His Arg Ala Pro                325                 330                 335 Glu Pro TyrLys Ala Leu Arg Asp Leu Lys Glu Ala Arg Gly Leu Leu            340                 345                 350 Ala Lys Asp LeuSer Val Leu Ala Leu Arg Glu Gly Leu Gly Leu Pro        355                 360                 365 Pro Gly Asp Asp ProMet Leu Leu Ala Tyr Leu Leu Asp Pro Ser Asn    370                 375                 380 Thr Thr Pro Glu Gly ValAla Arg Arg Tyr Gly Gly Glu Trp Thr Glu385                 390                 395                 400 Glu AlaGly Glu Arg Ala Ala Leu Ser Glu Arg Leu Phe Ala Asn Leu                405                 410                 415 Trp Gly ArgLeu Glu Gly Glu Glu Arg Leu Leu Trp Leu Tyr Arg Glu            420                 425                 430 Val Glu Arg ProLeu Ser Ala Val Leu Ala His Met Glu Ala Thr Gly        435                 440                 445 Val Arg Leu Asp ValAla Tyr Leu Arg Ala Leu Ser Leu Glu Val Ala    450                 455                 460 Glu Glu Ile Ala Arg LeuGlu Ala Glu Val Phe Arg Leu Ala Gly His465                 470                 475                 480 Pro PheAsn Leu Asn Ser Arg Asp Gln Leu Glu Arg Val Leu Phe Asp                485                 490                 495 Glu Leu GlyLeu Pro Ala Ile Gly Lys Thr Glu Lys Thr Gly Lys Arg            500                 505                 510 Ser Thr Ser AlaAla Val Leu Glu Ala Leu Arg Glu Ala His Pro Ile        515                 520                 525 Val Glu Lys Ile LeuGln Tyr Arg Glu Leu Thr Lys Leu Lys Ser Thr    530                 535                 540 Tyr Ile Asp Pro Leu ProAsp Leu Ile His Pro Arg Thr Gly Arg Leu545                 550                 555                 560 His ThrArg Phe Asn Gln Thr Ala Thr Ala Thr Gly Arg Leu Ser Ser                565                 570                 575 Ser Asp ProAsn Leu Gln Asn Ile Pro Val Arg Thr Pro Leu Gly Gln            580                 585                 590 Arg Ile Arg ArgAla Phe Ile Ala Glu Glu Gly Trp Leu Leu Val Ala        595                 600                 605 Leu Asp Tyr Ser GlnIle Glu Leu Arg Val Leu Ala His Leu Ser Gly    610                 615                 620 Asp Glu Asn Leu Ile ArgVal Phe Gln Glu Gly Arg Asp Ile His Thr625                 630                 635                 640 Glu ThrAla Ser Trp Met Phe Gly Val Pro Arg Glu Ala Val Asp Pro                645                 650                 655 Leu Met ArgArg Ala Ala Lys Thr Ile Asn Phe Gly Val Leu Tyr Gly            660                 665                 670 Met Ser Ala HisArg Leu Ser Gln Glu Leu Ala Ile Pro Tyr Glu Glu        675                 680                 685 Ala Gln Ala Phe IleGlu Arg Tyr Phe Gln Ser Phe Pro Lys Val Arg    690                 695                 700 Ala Trp Ile Glu Lys ThrLeu Glu Glu Gly Arg Arg Arg Gly Tyr Val705                 710                 715                 720 Glu ThrLeu Phe Gly Arg Arg Arg Tyr Val Pro Asp Leu Glu Ala Arg                725                 730                 735 Val Lys SerVal Arg Glu Ala Ala Glu Arg Met Ala Phe Asn Met Pro            740                 745                 750 Val Gln Gly ThrAla Ala Asp Leu Met Lys Leu Ala Met Val Lys Leu        755                 760                 765 Phe Pro Arg Leu GluGlu Met Gly Ala Arg Met Leu Leu Gln Val His    770                 775                 780 Asp Glu Leu Val Leu GluAla Pro Lys Glu Arg Ala Glu Ala Val Ala785                 790                 795                 800 Arg LeuAla Lys Glu Val Met Glu Gly Val Tyr Pro Leu Ala Val Pro                805                 810                 815 Leu Glu ValGlu Val Gly Ile Gly Glu Asp Trp Leu Ser Ala Lys Glu            820                 825                 830The following amino acid residues correspond to the amino acids betweenamino acid 643 and 661, where xxx represents intervening amino acidresidues in the native polymerase:

(SEQ ID NO: 12) 643-Ala xxx xxx xxx Phe xxx Val xxx xxx Glu Ala Val AspPro Leu Met Arg Arg Ala-661

Overlapping primers are used to introduce point mutations into thenative gene by PCR based mutagenesis (using Pfu DNA polymerase).

Complementary forward and reverse primers each contain a codon thatencodes the desired mutated amino acid residue. PCR using these primersresults in a nicked, non-methylated, double-stranded plasmid containingthe desired mutation. To remove the template DNA, the entire PCR productis treated with DpnI restriction enzyme (cuts at methylated guanosinesin the sequence GATC). Following digestion of the template plasmid, themutated plasmid is transformed and ligation occurs in vivo.

The following synthetic DNA oligonucleotide primers are used formutagenesis as described below, where the letters designated inlowercase have been modified to yield the desired Cysteine substitutionat the indicated position. Mutants are then screened via automatedsequencing.

ALANINE 643 TO CYSTEINE REPLACEMENT Taq Pol I_Ala643Cys_fwd (SEQ ID NO:13) 5′-C CAC ACG GAG ACC tgC AGC TGG ATG TTC GGC G-3′ TaqPolI_Ala643Cys_rev (SEQ ID NO: 14) 5′-C GCC GAA CAT CCA CGA Gca GGT CTCCGT GTG G-3′ PHENYLALANINE 647 TO CYSTEINE REPLACEMENT Taq PolI_Phe647Cys_fwd (SEQ ID NO: 15) 5′-CC GCC AGC TGG ATG TgC GGC GTC CCCCGG GAG GCC-3′ Taq Pol I_Phe647Cys_rev (SEQ ID NO: 16) 5′-GGC CTC CCGGGG GAC GCC GcA CAT CCA CGT GGC GG-3′ VALINE 649 TO CYSTEINE REPLACEMENTTaq Pol I_Val649Cys_fwd (SEQ ID NO: 17) 5′-GCC AGC TGG ATG TTC GGC tgCCCC CGG GAG GCC GTG G-3′ Taq Pol I_Val649Cys_rev (SEQ ID NO: 18) 5′-CCAC GGC CTC CCG GGG Gca GCC GAA CAT CCA GCT GGC-3′ GLUTAMIC ACID 652 TOCYSTEINE REPLACEMENT Taq Pol I_Glu652Cys_fwd (SEQ ID NO: 19) 5′-GGC GTCCCC CGG tgc GCC GTG GAC CCC CTG ATG CGC-3′ Taq Pol I_Glu652Cys_rev (SEQID NO: 20) 5′-GCG CAT CAG GGG GTC CAC GGC gca CCG GGG GAC GCC-3′ ALANINE653 TO CYSTEINE REPLACEMENT Taq Pol I_Ala653Cys_fwd (SEQ ID NO: 21)5′-GGC GTC CCC CGG GAG tgC GTG GAC CCC CTG ATG CGC-3′ Taq PolI_Ala653Cys_rev (SEQ ID NO: 22) 5′-GCG CAT CAG GGG GTC CAC Gca CTC CCGGGG GAC GCC-3′ VALINE 654 TO CYSTEINE REPLACEMENT Taq PolI_Val6S4Cys_fwd (SEQ ID NO: 23) 5′-GTC CCC CGG GAG GCC tgt GAC CCC CTGATG CGC-3′ Taq Pol I_Val654Cys_rev (SEQ ID NO: 24) 5′-GCG CAT CAG GGGGTC aca GGC CTC CCG GGG GAC-3′ ASPARTIC ACID 655 TO CYSTEINE REPLACEMENTTaq Pol I_D655C_fwd (SEQ ID NO: 25) 5′-CCC CGG GAG GCC GTG tgC CCC CTGATG CGC CGG-3′ Taq Pol I_D655C_rev (SEQ ID NO: 26) 5′-CCG GCG CAT CAGGGG Gca CAC GGC CTC CCG GGG-3′ PROLINE 656 TO CYSTEINE REPLACEMENT TaqPol I_Pro656Cys_fwd (SEQ ID NO: 27) 5′-CGG GAG GCC GTG GAC tgC CTG ATGCGC CGG GCG-3′ Taq Pol I_Pro656Cys_Rev (SEQ ID NO: 28) 5′-CGC CCG GCGCAT CAG Gca GTC CAC GGC CTC CCG-3′ LEUCINE 657 TO CYSTEINE REPLACEMENTTaq Pol I_Leu657Cys_fwd (SEQ ID NO: 29) 5′-GCC GTG GAC CCC tgc ATG CGCCGG GCG GCC-3′ Taq Pol I_Leu657Cys_rev (SEQ ID NO: 30) 5′-GGC CGC CCGGCG CAT gca GGG GTC CAC GGC-3′ METHIONINE 658 TO CYSTEINE REPLACEMENTTaq Pol I_Met658Cys_fwd (SEQ ID NO: 31) 5′-GCC GTG GAC CCC CTG tgt CGCCGG GCG GCC-3′ Taq Pol I_Met658Cys_rev (SEQ ID NO: 32) 5′-GGC CGC CCGGCG aca CAG GGG GTC CAC GGC-3′ ARGININE 659 TO CYSTEINE REPLACEMENT TaqPol I_Arg659Cys_fwd (SEQ ID NO: 33) 5′-GCC GTG GAC CCC CTG ATG tGC CGGGCG GCC AAG ACC-3′ Taq Pol I_Arg659Cys_rev (SEQ ID NO: 34) 5′-GGT CTTGGC CGC CCG GCa CAT CAG GGG GTC CAC GGC-3′ ARGININE 660 TO CYSTEINEREPLACEMENT Taq Pol I_Arg660Cys_fwd (SEQ ID NO: 35) 5′-GAC CCC CTG ATGCGC tGc GCG GCC AAG ACC ATC-3′ Taq Pol I_Arg660Cys_rev (SEQ ID NO: 36)5′-GAT GGT CTT GGC CGC gCa GCG CAT CAG GGG GTC-3′ ALANINE 661 TOCYSTEINE REPLACEMENT Taq Pol I_Ala661Cys_fwd (SEQ ID NO: 37) 5′-CCC CTGATG CGC CGG tgc GCC AAG ACC ATC AAC-3′ Taq Pol I_Ala661Cys_rev (SEQ IDNO: 38) 5′-GTT GAT GGT CTT GGC gca CCG GCG CAT CAG GGG-3′

The resulting mutant Taq polymerases are then reacted with a desiredatomic or molecular tag to tag the cysteine in the mutant structurethrough the SH group of the cysteine residue and screened for nativeand/or tagged dNTP incorporation and incorporation efficiency. Themutant polymerases are also screened for fluorescent activity duringbase incorporation. Thus, the present invention also relates to mutantTaq polymerase having a cysteine residue added one or more of the sitesselected from the group consisting of 513-518, 643, 647, 649 and653-661. After cysteine replacement and verification of polymeraseactivity using the modified dNTPs, the mutant Taq polymerases arereacted with a tag through the SH group of the inserted cysteineresidue. The resulting amino acid replacement for the positions 513-518,643, 647, 649 and 653-661 are shown below:

(SEQ ID NO: 39) Cys Ser Trp Met Phe Gly Val Pro Arg Glu Ala Val Asp ProLeu Met 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658Arg Arg Ala 659 660 661 (SEQ ID NO: 40) Ala Ser Trp Met Cys Gly Val ProArg Glu Ala Val Asp Pro Leu Met 643 644 645 646 647 648 649 650 651 652653 654 655 656 657 658 Arg Arg Ala 659 660 661 (SEQ ID NO: 41) Ala SerTrp Met Phe Gly Cys Pro Arg Glu Ala Val Asp Pro Leu Met 643 644 645 646647 648 649 650 651 652 653 654 655 656 657 658 Arg Arg Ala 659 660 661(SEQ ID NO: 42) Ala Ser Trp Met Phe Gly Val Pro Arg Cys Ala Val Asp ProLeu Met 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658Arg Arg Ala 659 660 661 (SEQ ID NO: 43) Ala Ser Trp Met Phe Gly Val ProArg Glu Cys Val Asp Pro Leu Met 643 644 645 646 647 648 649 650 651 652653 654 655 656 657 658 Arg Arg Ala 659 660 661 (SEQ ID NO: 44) Ala SerTrp Met Phe Gly Val Pro Arg Glu Ala Cys Asp Pro Leu Met 643 644 645 646647 648 649 650 651 652 653 654 655 656 657 658 Arg Arg Ala 659 660 661(SEQ ID NO: 45) Ala Ser Trp Met Phe Gly Val Pro Arg Glu Ala Val Cys ProLeu Met 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658Arg Arg Ala 659 660 661 (SEQ ID NO: 46) Ala Ser Trp Met Phe Gly Val ProArg Glu Ala Val Asp Cys Leu Met 643 644 645 646 647 648 649 650 651 652653 654 655 656 657 658 Arg Arg Ala 659 660 661 (SEQ ID NO: 47) Ala SerTrp Met Phe Gly Val Pro Arg Glu Ala Val Asp Pro Cys Met 643 644 645 646647 648 649 650 651 652 653 654 655 656 657 658 Arg Arg Ala 659 660 661(SEQ ID NO: 48) Ala Ser Trp Met Phe Gly Val Pro Arg Glu Ala Val Asp ProLeu Cys 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658Arg Arg Ala 659 660 661 (SEQ ID NO: 49) Ala Ser Trp Met Phe Gly Val ProArg Glu Ala Val Asp Pro Leu Met 643 644 645 646 647 648 649 650 651 652653 654 655 656 657 658 Cys Arg Ala 659 660 661 (SEQ ID NO: 50) Ala SerTrp Met Phe Gly Val Pro Arg Glu Ala Val Asp Pro Leu Met 643 644 645 646647 648 649 650 651 652 653 654 655 656 657 658 Arg Cys Ala 659 660 661(SEQ ID NO: 51) Ala Ser Trp Met Phe Gly Val Pro Arg Glu Ala Val Asp ProLeu Met 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658Arg Arg Cys 659 660 661 (SEQ ID NO: 52) Cys Thr Ser Ala Ala Val 513 514515 516 517 518 (SEQ ID NO: 53) Ser Cys Ser Ala Ala Val 513 514 515 516517 518 (SEQ ID NO: 54) Ser Thr Cys Ala Ala Val 513 514 515 516 517 518(SEQ ID NO: 55) Ser Thr Ser Cys Ala Val 513 514 515 516 517 518 (SEQ IDNO: 56) Ser Thr Ser Ala Cys Val 513 514 515 516 517 518 (SEQ ID NO: 57)Ser Thr Ser Ala Ala Cys 513 514 515 516 517 518

Synthesis of Modified dNTPs

Synthesis of (γ-AmNS)dATP

Nucleotide analogs which contain fluorophore1-aminonaphalene-5-sulfonate attached to the γ-phosphate bond weresynthesized (J. Biol. Chem. 254, 12069-12073, 1979). dATPanalog-(γ-AmNS) dATP was synthesized according to the proceduresslightly altered from what was described by Yarbrough and co-workers for(γ-AmNS)ATP with some modifications.

This example illustrates the preparation of gamma ANS tagged dATP, showngraphically in FIG. 4.

1-Aminonaphthalene-5-sulphonic acid (447 mg, 2 mmol, 40 eq., fromLancaster) was added to 10 mL of H₂O, and the pH was adjusted to 5.8with 1 N NaOH. The insoluble material was removed by syringe filter,yielding a solution which was essentially saturated for this pH value(˜0.18 to 0.2 M). 4 mL of 12.5 mM 5′triphosphate-g′-deoxyadenosinedisodium salt (0.05 mmol, 1 eq.) and 2 mL of 1 M1-(3-dimethylaminopropyl-)-3-ethyl-carbodiimide hydrochloride (DEC, 2mmol, 40 eq., from Lancaster) were added to a reaction vessel at 22° C.The reaction was initiated by adding 10 mL of the1-aminonaphthalene-5-sulfonate solution and allowed to continue for 2.5h. The pH was kept between 5.65-5.75 by periodic addition of 0.1 N HCl.After 2.5 h, the reaction was diluted to 50 mL and adjusted to asolution of 0.05 M triethylammonium bicarbonate buffer (TEAB, pH ˜7.5).The reaction product was chromatographed on a 50 mL DEAE-SEPHADEX® ionexchanger (A-25-120) column at low temperature that was equilibratedwith ˜pH 7.5 1.0 M aqueous TEAB (100 mL), 1.0 M aqueoussodiumbicarbonate (1100 mL), and ˜pH 7.5, 0.05 Maqueous TEAB (100 mL).The column was eluted with a linear gradient of ˜pH 7.5 aqueous TEABfrom 0.05 to 0.9 M. Approximately 10 mL fractions were collected.Absorbance and fluorescence profiles (UV 366 nm) of the fractions wereobtained after appropriate dilution. The fluorescent fraction eluted at0.7 M buffer after the peak of the unreacted DATβ and showed a brilliantblue fluorescence. The product-containing fractions were pooled, driedby lyophilizer and co-evaporated twice with H₂O/ethanol (70/30). Theresidue was taken up in water and lyophilized. ³¹P NMR (¹H decoupled,600 MHz, D₂O, Me₃PO₄ external reference, 293 K, pH 6.1) δ (ppm)-12.60,−14.10, −25.79. The reference compound dATP gave the following resonancepeaks: ³¹P NMR (dATP, Na⁺) in D₂O 293 K, δ (ppm)-11.53 (γ), −13.92 (α),−24.93 (β).

Using diode array UV detection HPLC, the fraction containing the desiredproduct was easily identified by the distinct absorption of the ANSgroup at 366 nm. Additionally, ³¹P NMR spectra were recorded for theγ-phosphate tagged dATP and regular dATP in an aqueous solution. Foreach compound, three characteristic resonances were observed, confirmingthe triphosphate moiety in the γ-tagged dATP. The combinedanalyses—¹H-NMR, HPLC, and UV spectra—provide supporting information forthe formation of the correct compound.

The same synthetic procedure was used to prepare γ-ANS-phosphatemodified dGTP, dTTP and dCTP.

γ-Phosphate-Tagged dNTP Incorporation by Tag Polymerase

The following examples illustrate that commercially available Taq DNApolymerase efficiently incorporates the ANS-γ-phosphate dNTPs, thesyntheses and characterization as described above.

The first example illustrates the incorporation of ANS-γ-phosphate DATPto produce extended DNA products from primer templates. The reactionswere carried out in extension buffer and the resulting radiolabeledproducts were size separated on a 20% denaturing polyacrylamide gel.Data was collected using a phosphorimaging system. Referring now theFIG. 5, Lane 1 contains 5′radiolabeled ‘TOP’ probe in extension buffer.Lane 2 contains Taq DNA polymerase, 50 μM dGTP incubated with a DNAduplex (radiolabeled TOP with excess ‘BOT-Sau’). Lane 3 contains Taq DNApolymerase, 50 μM dATP incubated with a DNA duplex (radiolabeled TOPwith excess ‘BOT-Sau’). Lane 4 contains Taq DNA polymerase, 50 μMANS-γ-dATP incubated with a DNA duplex (radiolabeled TOP with excess‘BOT-Sau’). Lane 5 contains Taq DNA polymerase, 50 μM dGTP incubatedwith a DNA duplex (radiolabeled TOP with excess ‘BOT-T’). Lane 6contains spill-over from lane 5. Lane 7 contains Taq DNA polymerase, 50μM dATP incubated with a DNA duplex (radiolabeled TOP with excess‘BOT-T’). Lane 8 contains Taq DNA polymerase, 50 μM ANS-γ-dATP incubatedwith a DNA duplex (radiolabeled TOP with excess ‘BOT-T’). Lane 9contains Taq DNA polymerase, 50 μM dGTP incubated with a DNA duplex(radiolabeled TOP with excess ‘BOT-3T’). Lane 10 contains Taq DNApolymerase, 50 μM dATP incubated with a DNA duplex (radiolabeled TOPwith excess ‘BOT-3T’). Lane 11 contains Taq DNA polymerase, ANS-γ-dATPincubated with a DNA duplex (radiolabeled TOP with excess ‘BOT-3T’).Lane 12 contains 5′radiolabeled ‘TOP’ probe in extension buffer. Lane 13contains 5′ radiolabeled ‘TOP’ probe and Taq DNA polymerase in extensionbuffer. Oligonucleotide sequences are shown in Table V.

Quantitative comparison of lane 1 with lane 4 demonstrates that verylittle non-specific, single-base extension was detected when ANS-γ-dATPwas included in the reaction, but the first incorporated base should bedGTP (which was not added to the reaction). Quantitative analysis oflanes 1 and 8 demonstrates that approximately 71% of the TOP primer areextended by a template-directed single base when ANS-γ-dATP was includedin the reaction and the first incorporated base should be dATP. Thus,Taq DNA polymerase incorporates γ-tagged nucleotides. Equally importantto the polymerase's ability to incorporate a γ-tagged nucleotide is itsability to extend the DNA polymer after the modified DATP wasincorporated. Comparison of lane 1 with lane 11 demonstrated that a DNAstrand was extended after a γ-tagged nucleotide was incorporated. Thus,incorporation of a modified nucleotide was not detrimental to polymeraseactivity. Note, too, that extension of the primer strand byincorporation of an ANS-γ-nucleotide depended upon Watson-Crickbase-pairing rules. In fact, the fidelity of nucleotide incorporationwas increased at least 15-fold by the addition of this tag to theγ-phosphate.

This next example illustrates the synthesis of extended DNA polymersusing all four ANS tagged γ-phosphate dNTPs. Products generated in thesereactions were separated on a 20% denaturing polyacrylamide gel, the gelwas dried and imaged following overnight exposure to a Fuji BAS1000imaging plate. Referring now to FIG. 6, an image of (A) the actual gel,(B) a lightened phosphorimage and (C) an enhanced phosphorimage. Lanedescriptions for A, B, and C follow: Lane 1 is the control containingpurified 10-base primer extended to 11 and 12 bases by template-mediatedaddition of α-³²P dCTP. Lane 2 includes the same primer that wasincubated with double-stranded plasmid DNA at 96° C. for 3 minutes (todenature template), the reaction was brought to 37° C. (to annealprimer-template), Taq DNA polymerase and all four natural dNTPs (100 uM,each) were added and the reaction was incubated at 37° C. for 60minutes. Lane 3 includes the same labeled primer that was incubated withdouble-stranded DNA plasmid at 96° C. for 3 minutes, the reaction wasDNA polymerase and all four gamma-modified DNTPs (100 uM, each) wereadded and the reaction was incubated at 37° C. for 60 minutes. Lane 4includes the control, purified 10-base primer that was extended to 11and 12 bases by the addition of α-³²P-dCTP was cycled in parallel withlanes 5-8 reactions. Lane 5 includes the same ³²P-labeled primer thatwas incubated with double-stranded plasmid DNA at 96° C. for 3 minutes,the reaction was brought to 37° C. for 10 minutes, during which time TaqDNA polymerase and all four natural dNTPs (100 uM, each) were added. Thereaction was cycled 25 times at 96° C. for 10 seconds, 37° C. for 1minute, and 70° C. for 5 minutes. Lane 6 includes the same ³²P-labeledprimer that was incubated with double-stranded plasmid DNA at 96° C. for3 minutes, the reaction was brought to 37° C. for 10 minutes, duringwhich time Taq DNA polymerase and all four gamma-modified dNTPs (100 uM,each) were added. The reaction was cycled 25 times at 96° C. for 10seconds, 37° C. for 1 minute, and 70° C. for 5 minutes. Lane 7 includesnonpurified, 10-base, ³²P-labeled primer that was incubated withdouble-stranded DNA plasmid at 96° C. for 3 minutes, the reaction wasbrought to 37° C. for 10 minutes, during which time Taq DNA polymeraseand all four natural dNTPs (100 uM, each) were added. The reaction wascycled 25 times at 96° C. for 10 seconds, 37° C. for 1 minute, and 70°C. for 5 minutes. Lane 8 includes nonpurified, 10-base, ³²P-labeledprimer that was incubated with double-stranded DNA plasmid at 96° C. for3 minutes, the reaction was brought to 37° C. for 10 minutes, duringwhich time Taq DNA polymerase and all four gamma-modified dNTPs wereadded. The reaction was cycled 25 times at 96° C. for 10 seconds, 37° C.for 1 minute, and 70° C. for 5 minutes. Evident in the reactionsinvolving tagged dNTPs is a substantial decrease in pyrophosphorolysisas compared to reactions involving natural nucleotides.

This next example illustrates the synthesis of long DNA polymers usingall four ANS tagged γ-phosphate dNTPs. Each primer extension reactionwas split into two fractions, and one fraction was electrophoresedthrough a 20% denaturing gel (as described above), while the other waselectrophoresed through a 6% denaturing gel to better estimate productlengths. The gel was dried and imaged (overnight) to a Fuji BAS1000imaging plate. Referring now to FIG. 7, an image of (A) the actual gel,(B) a lightened phosphorimage of the actual gel, and (C) an enhancedphosphorimage of the actual gel. Lane descriptions for A, B, and Cfollow: Lane 1 includes 123 Marker with size standards indicated at theleft of each panel. Lane 2 contains the control, purified 10-base primerextended to 11 and 12 bases by template-mediated addition of α-³²P dCTP.Lane 3 contains the same ³²P-labeled primer that was incubated withdouble-stranded plasmid DNA at 96° C. for 3 minutes (to denaturetemplate), the reaction was brought to 37° C. (to annealprimer-template), Taq DNA polymerase and all four natural dNTPs (100 uM,each) were added and the reaction was incubated at 37° C. for 60minutes. Lane 4 includes the same ³²P-labeled primer that was incubatedwith double-stranded DNA plasmid at 96° C. for 3 minutes, the reactionwas brought to 37° C., Taq DNA polymerase and all four gamma-modifieddNTPs (100 uM, each) were added and the reaction was incubated at 37° C.for 60 minutes. Lane 5 includes the control, purified 10-base primerthat was extended to 11 and 12 bases by the addition of α-³²P-dCTP wascycled in parallel with lanes 5-8 reactions. Lane 6 includes the same³²P-labeled primer that was incubated with double-stranded plasmid DNAat 96° C. for 3 minutes, the reaction was brought to 37° C. for 10minutes, during which time Taq DNA polymerase and all four natural dNTPs(100 uM, each) were added. The reaction was cycled 25 times at 96° C.for 10 seconds, 37° C. for 1 minute, and 70° C. for 5 minutes. Lane 7includes the same ³²P-labeled primer that was incubated withdouble-stranded plasmid DNA at 96° C. for 3 minutes, the reaction wasbrought to 37° C. for 10 minutes, during which time Taq DNA polymeraseand all four gamma-modified dNTPs (100 uM, each) were added. Thereaction was cycled 25 times at 96° C. for 10 seconds, 37° C. for 1minute, and 70° C. for 5 minutes. Lane 8 includes nonpurified, 10-base,³²P-labeled primer that was incubated with double-stranded DNA plasmidat 96° C. for 3 minutes, the reaction was brought to 37° C. for 10minutes, during which time Taq DNA polymerase and all four natural dNTPs(100 uM, each) were added. The reaction was cycled 25 times at 96° C.for 10 seconds, 37° C. for 1 minute, and 70° C. for 5 minutes. Lane 9includes nonpurified, 10-base, ³²P-labeled primer that was incubatedwith double-stranded DNA plasmid at 96° C. for 3 minutes, the reactionwas brought to 37° C. for 10 minutes, during which time Taq DNApolymerase and all four gamma-modified dNTPs were added. The reactionwas cycled 25 times at 96° C. for 10 seconds, 37° C. for 1 minute, and70° C. for 5 minutes.

The majority of extension products in this reaction are several hundredbases long for both natural and γ-modified dNTPs, and a significantpercentage of these products are too large to enter the gel. Thus,demonstrating the gamma phosphate tagged dNTPs are used by Taqpolymerase to generate long DNA polymers that are non-tagged or nativeDNA polymer chains.

Different Polymerases React Differently to the Gamma-ModifiedNucleotides

The indicated enzyme (Taq DNA Polymerase, SEQUENASE®, HIV-1 ReverseTranscriptase, T7 DNA Polymerase, Klenow Fragment, Pfu DNA Polymerase)were incubated in the manufacturers suggested reaction buffer, 50 μM ofthe indicated nucleotide at 37° C. for 30-60 minutes, and the reactionproducts were analyzed by size separation through a 20% denaturing gel.

Taq DNA polymerase efficiently uses the gamma-modified nucleotides tosynthesize extended DNA polymers at increased accuracy as shown in FIGS.4-6.

The Klenow fragment from E. coli DNA polymerase I efficiently uses thegamma-modified nucleotides, but does not exhibit the extreme fidelityimprovements observed with other enzymes as shown in FIG. 8.

Pfu DNA polymerase does not efficiently use gamma-modified nucleotidesand is, thus, not a preferred enzyme for the single-molecule sequencingsystem as shown in FIG. 9.

HIV-1 reverse transcriptase efficiently uses the gamma-taggednucleotides, and significant fidelity improvement results as shown inFIG. 10.

Polymerization activity is difficult to detect in the reaction productsgenerated by native T7 DNA polymerase (due to the presence of theenzymes exonuclease activity). However, its genetically modifiedderivative, SEQUENASE®, shows that the gamma-modified nucleotides areefficiently incorporated, and that incorporation fidelity is improved,relative to non-modified nucleotides. The experimental results fornative T7 DNA polymerase and SEQUENASE® are shown in FIG. 11.

Thus, for the Taq polymerase or the HIV1 reverse transcriptase, improvedfidelity, due to the use of the gamma-modified dNTPs of this invention,enables single-molecule DNA sequencing. However, not all polymerasesequally utilize the gamma-modified nucleotides of this invention,specifically, Klenow, SEQUENASE®, HIV-1 reverse transcriptase and Taqpolymerases incorporate the modified nucleotides of this invention,while the Pfu DNA polymerase does not appear to incorporate the modifiednucleotides of this invention.

Improved PCR-Generation of Long DNA Sequences

The fidelity of nucleic acid synthesis is a limiting factor in achievingamplification of long target molecules using PCR. The misincorporationof nucleotides during the synthesis of primer extension products limitsthe length of target that can be efficiently amplified. The effect onprimer extension of a 3′-terminal base that is mismatched with thetemplate is described in Huang et al., 1992, Nucl. Acids Res.20:4567-4573, incorporated herein by reference. The presence ofmisincorporated nucleotides may result in prematurely terminated strandsynthesis, reducing the number of template strands for future rounds ofamplification, and thus reducing the efficiency of long targetamplification. Even low levels of nucleotide misincorporation may becomecritical for sequences longer than 10 kb. The data shown in FIG. 4 showsthat the fidelity of DNA synthesis using gamma tagged dNTPs is improvedfor the native Taq polymerase making longer DNA extension possiblewithout the need for adding polymerases with 3′-to 5′ exonuclease, or“proofreading”, activity as required in the long-distance PCR methoddeveloped by Cheng et al., U.S. Pat. No. 5,512,462, incorporated hereinby reference. Thus, the present invention provides an improved PCRsystem for generating increased extension length PCR amplified DNAproducts comprising contacting a native Taq polymerase with gamma taggeddNTPs of this invention under PCR reaction conditions. The extendedlength PCR products are due to improved accuracy of base incorporation,resulting from the use of the gamma-modified dNTPs of this invention.

Signal Intensity and Reaction Kinetics Provide Information ConcerningBase Identity

Signal intensities for each nucleotide in the extended DNA strand areused to determine, confirm or support base identity data. Referring nowto FIG. 12, the solid line corresponds to reaction products producedwhen the four natural nucleotides (dATP, dCTP, dGTP and dTTP) areincluded in the synthesis reaction. The dashed or broken linecorresponds to reaction products produced when proprietary,base-modified nucleotides are included in the reaction. As is clearlydemonstrated, sequence context and base modification(s) influencereaction product intensity and/or kinetics, and these identifyingpatterns are incorporated into proprietary base-calling software toprovide a high confidence value for base identity at each sequencedposition.

All references cited herein and listed in are incorporated by reference.While this invention has been described fully and completely, it shouldbe understood that, within the scope of the appended claims, theinvention maybe practiced otherwise than as specifically described.Although the invention has been disclosed with reference to itspreferred embodiments, from reading this description those of skill inthe art may appreciate changes and modifications that may be made whichdo not depart from the scope and spirit of the invention as describedabove and claimed hereafter.

1. A method of sequencing nucleic acid molecules at the single moleculelevel, comprising: immobilizing a member of a replication complexcomprising a polymerizing agent, an oligonucleotide primer and a nucleicacid template on or in a substrate; contacting the immobilized memberwith the non-immobilized members of the replication complex to form animmobilized replication complex; incubating the immobilized replicationcomplex with monomers for the polymerizing agent, where at least one ofthe monomer types includes a monomer tag covalently bonded to a site onthe monomer that is not incorporated into a growing complementarynucleic acid sequence, where the monomer tag has a detectable propertycapable of being detected by a detector; detecting a change in thedetectable property of each monomer tag as a tagged monomer isincorporated by the polymerizing agent into the growing complementarynucleic acid sequence; and converting the detected changes in thedetectable property of the monomer tags to an identity of one monomer ora plurality of monomers corresponding to one nucleotide or a pluralityof nucleotides of the template.
 2. A method of sequencing nucleic acidmolecules comprising the steps of: confining a plurality of polymerizingagents on or in a substrate to form a plurality of confined polymerizingagents; contacting the confined polymerizing agents with a solutionincluding a nucleic acid template and oligonucleotide primers, where theprimers are adapted to duplex with a portion of the templates to formextendable nucleic acid duplexes and the duplexes are adapted to complexwith some or all of the confined polymerizing agents to form confinedreplication complexes; incubating the confined replication complexeswith monomer types for the polymerizing agent, where at least two of themonomer types include unique monomer tags covalently bonded to monomersites on the monomer types that are not incorporated into a growingcomplementary nucleic acid sequence and where the monomer tags have adetectable property capable of being detected by a detector; detecting achange in the detectable property of each monomer tag as a taggedmonomer is incorporated by the polymerizing agent into the growingcomplementary nucleic acid sequence; and converting the detected changesin the detectable property of the incorporated tagged monomer to anidentity of one nucleotide or a plurality of nucleotides of the template3. A method of sequencing nucleic acid molecules comprising the stepsof: confining a plurality of polymerizing agents on or in a substrate toform a plurality of confined polymerizing agents, where eachpolymerizing agent includes a polymerizing agent donor fluorescent tagcovalently bonded to a site on the polymerizing agent or associated witha molecule associated with the polymerizing agent; contacting theconfined polymerizing agents with a solution including a nucleic acidtemplate and oligonucleotide primers, where the primers are adapted toduplex with a portion of the template to form extendable nucleic acidduplexes and the duplexes are adapted to complex with some or all of theconfined polymerizing agents to form confined replication complexes;incubating the confined replication complex with four nucleotide typesfor the polymerizing agent, where a first nucleotide type includes afirst acceptor fluorescent tag covalently bonded to a site thereof and asecond nucleotide type includes a second acceptor fluorescent tagcovalently bonded to a site thereof and where the nucleotide tags arenot incorporated by the polymerizing agent into a growing complementarynucleic acid sequence and where the nucleotide tags are capable ofundergoing fluorescence resonance energy transfer (FRET) with the donortag and where the acceptor tags are the same or different; detectingfluorescent light emitted by each monomer tag as a tagged nucleotide isincorporated by a polymerizing agent into a growing complementarynucleic acid sequence via an FRET interaction between the incorporatingacceptor tag on the tagged nucleotide and the donor tag on thepolymerizing agent incorporating the tagged monomer to produce dataevidencing a sequence of tagged monomer incorporation events; andconverting the FRET data into an identity of one nucleotide or aplurality of nucleotides of the template.
 4. A method of sequencingnucleic acid molecules comprising: confining a plurality of polymerizingagents on or in a substrate to form a plurality of confined polymerizingagents, where each polymerizing agent includes a polymerizing agentdonor fluorescent tag covalently bonded to a site on the polymerizingagent or associated with a molecule associated with the polymerizingagent; contacting the confined polymerizing agents with a solutionincluding a nucleic acid template and oligonucleotide primers, where theprimers are adapted to duplex with a portion of the template to formextendable nucleic acid duplexes and the duplexes are adapted to complexwith some or all of the confined polymerizing agents to form confinedreplication complexes; incubating the confined replication complex withfour dNTP types for the polymerizing agent, where a first monomer typeincludes a first acceptor fluorescent tag covalently bonded to a sitethereof, a second monomer type includes a second acceptor fluorescenttag covalently bonded to a site thereof, and a third monomer typeincludes a third acceptor fluorescent tag covalently bonded to a sitethereof, and where the monomer tags are not incorporated by thepolymerizing agent into a growing complementary nucleic acid sequence,and where the monomer tags are capable of undergoing fluorescenceresonance energy transfer (FRET) with the donor tag and where theacceptor tags are the same or different; detecting fluorescent lightemitted by each monomer tag as a tagged monomer is incorporated by apolymerizing agent into a growing complementary nucleic acid sequencevia an FRET interaction between the incorporating acceptor tag on themonomer and the donor tag on the polymerizing agent incorporating thetagged monomer to produce data evidencing a sequence of monomerincorporation events; and converting the FRET data into an identity ofone nucleotide or a plurality of nucleotides of the template.
 5. Amethod of sequencing nucleic acid molecules comprising: confining aplurality of polymerizing agents on or in a substrate to form aplurality of immobilized polymerizing agents, where each polymerizingagent includes a polymerizing agent donor fluorescent tag covalentlybonded to a site on the polymerizing agent or associated with a moleculeassociated with the polymerizing agent; contacting the confinedpolymerizing agents with a solution including a nucleic acid templateand oligonucleotide primers, where the primers are adapted to duplexwith a portion of the template to form extendable nucleic acid duplexesand the duplexes are adapted to complex with some or all of the confinedpolymerizing agents to form confined replication complexes; incubatingthe confined replication complex with four dNTP types for thepolymerizing agent, where a first monomer type includes a first acceptorfluorescent tag covalently bonded to a site thereof, a second monomertype includes a second acceptor fluorescent tag covalently bonded to asite thereof, a third monomer type includes a third acceptor fluorescenttag covalently bonded to a site thereof, and a fourth monomer typeincludes a fourth acceptor fluorescent tag covalently bonded to a sitethereof, and where the monomer tags are not incorporated by thepolymerizing agent into a growing complementary nucleic acid sequence,and where the monomer tags are capable of undergoing fluorescenceresonance energy transfer (FRET) with the donor tag and where theacceptor tags are the same or different; detecting fluorescent lightemitted by each monomer tag as a tagged monomer is incorporated by apolymerizing agent into a growing complementary nucleic acid sequencevia an FRET interaction between the incorporating acceptor tag on themonomer and the donor tag on the polymerizing agent incorporating thetagged monomer to produce data evidencing a sequence of monomerincorporation events; and converting the FRET data into an identity ofone nucleotide or a plurality of nucleotides of the template.
 6. Amethod comprising the steps of: confining a polymerizing agent on or ina substrate; incubating the polymerizing agent in the presence of atemplate polymer comprising a sequence of the monomers, optionallyprimers adapted to duplex with a portion of the template polymer, andmonomers for the polymerizing agent, where each monomer type includes anunique monomer tag covalently bonded to a site of the monomer that isnot incorporated by the polymerizing agent into a growing complementarypolymer comprising a sequence of the monomers complementary to thetemplate polymer and each monomer tag has a detectably property capableof being detected by a detector; detecting a change in the detectableproperty as each monomer is incorporated by the polymerizing agent intothe growing complementary polymer; and converting the detected changesin the detectable property of the monomer tag of each incorporatedmonomer to a sequence of monomers in the template polymer.
 7. A methodof sequencing nucleic acid molecules at the single molecule level,comprising the steps of: confining a member of a replication complexcomprising a polymerizing agent, a primer and a template in a region,area, well, groove, channel or other similar structure on the substratecapable of being filled with an appropriate polymerizing medium;contacting the confined member with the other members of the replicationcomplex to form a confined replication complex; incubating the confinedreplication complex with monomers for the polymerizing agent in thepresence of the medium, where at least one of the monomer types includesmonomer tag attached to a site on the monomer that is not incorporatedinto a growing complementary sequence of monomers, where the monomer taghas a detectably property capable of being detected by a detector;detecting a change in the detectable property of each monomer tag as atagged monomer is incorporated by the polymerizing agent into thegrowing complementary sequence; and converting the detected changes inthe detectable property to an identity of one monomer or a plurality ofmonomers corresponding to one nucleotide or a plurality of nucleotidesof the template.